I was playing around with log files and regular expressions, wishing to be able to supply a RE on the command line to operate on a log file. And I noticed something odd.
Perl's regular expressions admit \L, \U and \Q directives. The latter is quite useful: it applies quotemeta to the remainder of the string, or up until a \E is encountered. This comes in handy for matching strings containing brackets, dots and all those pesky metacharacters that tend to abound in log files.
The trouble is, it doesn't work.
I'll use \U as an example, because it's slightly less mind-bending to follow what's going on. But the same thing applies to all three directives (and it's really only \Q that I'm really interested in).
Consider:
print qr/a\Ubc/; # prints (?-xism:aBC)all is well and good, but what if you want to fetch the pattern from the command line?
perl -le '$patt = shift; print qr/$patt/' 'a\Ubc' # prints (?-xism:a\Ubc) perl -le '$patt = shift; print qr/$patt/' 'a\\Ubc' # prints (?-xism:a\\Ubc)
I.e., I tried doubling up the backslashes just in case the shell was giving me grief, but that's not the case. And regardless of that, I don't particularly care what it looks like, the main issue is that it doesn't match what it should:
my $patt = shift; # e.g. 'a\Ubc' from the shell $patt = qr/$patt/; my $target = 'aBC'; print $target =~ /$patt/; # prints nothing
Now this doesn't match aBC. It doesn't match 'a\Ubc' literally, either for that matter. In fact, I don't know what, if anything, it does match.
I have figured out one way to make it work: put the qr// expression inside a string eval and all is well:
my $patt = shift; $patt = eval "qr/$patt/"; # eeeww # patt is now (?-xism:aBC) if given 'a\Ubc' my $target = 'aBC'; print $target =~ /$patt/; # prints 1
Now all is fine, but the cure is worse than the disease. Any person reading the code will quickly spot that they could have a lot of fun by specifying a pattern such as /.`rm -rf /`./ and then you are in a world of pain.
At this point, the only way out of this conundrum that I can see is to either hand parse the pattern (erk) or use a Safe compartment (re-erk).
I think, however, that my thinking is stuck in some sort of conceptual rut. I can't be the first person to stumble across this behaviour and there must be something really obvious I'm missing. In which case, upside smacks to the head would be most appreciated.
- another intruder with the mooring in the heart of the Perl
In reply to qr/string/ is not the same as qr/$var/ ? by grinder
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |