I was playing around with log files and regular expressions, wishing to be able to supply a RE on the command line to operate on a log file. And I noticed something odd.

Perl's regular expressions admit \L, \U and \Q directives. The latter is quite useful: it applies quotemeta to the remainder of the string, or up until a \E is encountered. This comes in handy for matching strings containing brackets, dots and all those pesky metacharacters that tend to abound in log files.

The trouble is, it doesn't work.

I'll use \U as an example, because it's slightly less mind-bending to follow what's going on. But the same thing applies to all three directives (and it's really only \Q that I'm really interested in).

Consider:

print qr/a\Ubc/; # prints (?-xism:aBC)

all is well and good, but what if you want to fetch the pattern from the command line?

perl -le '$patt = shift; print qr/$patt/' 'a\Ubc' # prints (?-xism:a\Ubc) perl -le '$patt = shift; print qr/$patt/' 'a\\Ubc' # prints (?-xism:a\\Ubc)

I.e., I tried doubling up the backslashes just in case the shell was giving me grief, but that's not the case. And regardless of that, I don't particularly care what it looks like, the main issue is that it doesn't match what it should:

my $patt = shift; # e.g. 'a\Ubc' from the shell $patt = qr/$patt/; my $target = 'aBC'; print $target =~ /$patt/; # prints nothing

Now this doesn't match aBC. It doesn't match 'a\Ubc' literally, either for that matter. In fact, I don't know what, if anything, it does match.

I have figured out one way to make it work: put the qr// expression inside a string eval and all is well:

my $patt = shift; $patt = eval "qr/$patt/"; # eeeww # patt is now (?-xism:aBC) if given 'a\Ubc' my $target = 'aBC'; print $target =~ /$patt/; # prints 1

Now all is fine, but the cure is worse than the disease. Any person reading the code will quickly spot that they could have a lot of fun by specifying a pattern such as /.`rm -rf /`./ and then you are in a world of pain.

At this point, the only way out of this conundrum that I can see is to either hand parse the pattern (erk) or use a Safe compartment (re-erk).

I think, however, that my thinking is stuck in some sort of conceptual rut. I can't be the first person to stumble across this behaviour and there must be something really obvious I'm missing. In which case, upside smacks to the head would be most appreciated.

- another intruder with the mooring in the heart of the Perl


In reply to qr/string/ is not the same as qr/$var/ ? by grinder

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.