Imagist has asked for the wisdom of the Perl Monks concerning the following question:

I have to become familiar with a piece of code that I'm going to be working on, but the following code is giving me problems:

$value =~ tr/+/ /; $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C",hex($1))/eg; $name =~ tr/+/ /; $name =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C",hex($1))/eg;

I have no idea what it does! I understand that it's a regular expression, but while I have some understanding of how regular expressions work, I don't understand this, especially the pack("C",hex($1)) part.

Now, what it's supposed to be doing right now is filtering out special characters, but it seems to me that it should be filtering out all the letters of the alphabet after F too, so I'm not sure what's going on. Could somebody help?

Replies are listed 'Best First'.
Re: Regular Confuscion
by liverpole (Monsignor) on Jan 31, 2007 at 20:11 UTC
    Hi Imagist,

    It's not getting rid of special characters, it's converting text containing hexadecimal values the corresponding bytes, if the text (either $value or $name) contains a "%" symbol, followed by 1 or more hexadecimal digits.

    Read up on regular expressions, and you'll see that the /e switch at the end has the effect of doing an eval on the replacement text.

    Also, take a look at the hex function.


    s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/
      Oh, I see. I was wading though the perl documentation on the "pack" function and couldn't see the forest for the trees when I was trying to understand this code. Thanks for the explanation.
Re: Regular Confuscion
by Joost (Canon) on Jan 31, 2007 at 20:37 UTC
      That looks suspiciously like a CGI parameter decoder.

      I quite agree. And if the CGI module is overkill for the problem (since it does so many things), there are other more lightweight solutions, such as CGI::Deurl, which decodes the parameter strings in CGI requests, and that's all it does. Rose::URI and URI::QueryParam are worth having a look at as well. One is bound to find something that suits the problem domain.

      And in regards to the OP, whatever solution is chosen, the exact mangling should be encapsulated in a subroutine, so that one calls:

      $name = decode($name); $value = decode($value);

      This way, whatever solution you settle upon, it will be a simple matter to change it to something else when something better comes along, and you won't have to change the rest of your code.

      • another intruder with the mooring in the heart of the Perl

      I disagree. It depends on the context of the problem.

      It's possible the original author wrote the script this way for a number of reasons including

      • getting maximum speed by inlining a complex function
      • showing off by reimplementing a URL decoder
      • lack of experience

      It's also possible that the script is not a CGI script, or that it is an exceptionally lightweight CGI script.

      For the most part, I would use the CGI module myself, but there are times when I wouldn't.

        Sure there are situations where you might not use the CGI module, but
        • Optimizing parameter parsing is not going to give you any significant speedup unless you've got tens of thousands of parameters and hardly do anything with them.
        • URL decoders are not hard to code, except that people seem to forget to implement the full spec. (See for example & vs ; separators)
        • Lack of experience is not a valid reason to keep code that could be replaced with a well-tested core module.
Re: Regular Confuscion
by GrandFather (Saint) on Jan 31, 2007 at 20:30 UTC
    $value =~ tr/+/ /;

    replaces + characters with spaces

    $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C",hex($1))/eg;

    replaces pairs of hex digits preceded by % with the character that has that value. The /e flag causes pack("C",hex($1)) to be executed and the result is used for the substitution string. For example '%41' would become 'A' (assuming ASCII is being used).


    DWIM is Perl's answer to Gödel
Re: Regular Confuscion
by philcrow (Priest) on Jan 31, 2007 at 20:22 UTC
    I'd make a little script with the first two lines you showed but add two more:
    my $value = '%22Happy%20Birthday%21%22'; $value =~ tr/+/ /; $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C",hex($1))/eg; print "$value\n";
    Try running that, it should show you the effect of your code, if not the mechanism.

    Phil