ramblinpeck has asked for the wisdom of the Perl Monks concerning the following question:

Trying to convert all the hex chars in a file to something I can read, from %61 to a for example

currently I have:

perl -i.bak -pe 's/\%([0-7][0-9-A-F])/char(hex($1))/g' <filename>

but this is just giving me a file with 'char(hex(61))' in place of '%61' instead of 'a'. Gotta be simple, but I can't for the life of me figure it out. Thanks.

Replies are listed 'Best First'.
Re: Command Line Regex
by saintmike (Vicar) on Apr 07, 2006 at 00:18 UTC
    It's already been invented:
    perl -MURI::Escape -n -e 'print uri_unescape($_)' filename
Re: Command Line Regex
by graff (Chancellor) on Apr 07, 2006 at 04:35 UTC
    You were really close -- change "char" to "chr", add "e" at the end, and drop the dash between 9 and A:
    perl -i.bak -pe 's/\%([0-7][0-9A-F])/chr(hex($1))/ge'
    Bear in mind that values %00-%1F and %7F won't be things that you can "read" very well -- in fact, if you encounter %0D without an adjacent %0A, the resulting display could be very misleading. You might want to limit the regex to:
    s/\%([2-6][0-9A-F]|7[0-9A-E])/chr(hex($1))/ge

      Missed the the dash in "0-9-A-F"; thanks for catching.

Re: Command Line Regex
by parv (Parson) on Apr 06, 2006 at 23:33 UTC

    You would need /e flag (see L<perlop(1)> for s///) to evaluate the right hand side. Secondly, function is chr() not "char()" to change a given number to a character.

    UPDATE: Just noticed that the part of the regex in OP is supposed to be [0-7][0-9A-F] not [0-70-9A-F], otherwise OP could not have gotten "61" in output.

    Edit: g0n removed TT tags & replaced POD style code markup with code tags

Re: Command Line Regex
by polettix (Vicar) on Apr 07, 2006 at 00:22 UTC
    See URI::Escape for a safe and robust way to do that. Anyway, if you want that regex to work, you probably have to dig a bit perlop looking for the e modifier in the section about the substitution operator.

    Flavio
    perl -ple'$_=reverse' <<<ti.xittelop@oivalf

    Don't fool yourself.

      Well, OP has almost what C<URI::Escape::uri_unescape()> does (after including C</e>). Only difference is that C<uri_unescape()> uses C<[0-9A-F]{2}>, which include 128 (UPDATE: 0xff - 0x7f; previously i wrote 72, calcualted very wrongly) extra characters. (For the pedantic, C<uri_unescape()> also differentiates the list context.)

        Only talking about a safe and robust way to do it. Using a function from a module (even your own module) is more readable and usually less error prone - like forgetting to put an e in the right place.

        PS: why the POD formatting instead of following Writeup Formatting Tips? Is there some functionality in PM that I missed?

        Flavio
        perl -ple'$_=reverse' <<<ti.xittelop@oivalf

        Don't fool yourself.
Re: Command Line Regex
by aquarium (Curate) on Apr 07, 2006 at 11:31 UTC
    if you had the entire file in hex (without silly % characters) then you could use pack/unpack instead of calling in the cavalry (regex)
    the hardest line to type correctly is: stty erase ^H
Re: Command Line Regex
by Andrew_Levenson (Hermit) on Apr 08, 2006 at 21:13 UTC
    There shouldn't be a - between the 9 and the A, should there?

    I'm still getting the hang of this stuff, heh.