Re: Regex Grumblings (Variable Interpolation)
by jeroenes (Priest) on May 23, 2001 at 15:20 UTC
|
Try the same with $foo="$bar";. The thing is,
single quotes will get the characters literally ($,b,a,r)
but doubles will interpolate (w,h,a,t,e,v,e,r,_,b,a,r,_,w,a,s).
A regex reads a '$' as the end of the string or a variable to
interpolate. However, with $foo you just get the literal
($,b,a,r) back, no matter if you use \Q or not. So:
$_='Some string with $foo and bar';
$bar='bar';
$foo='$bar';
/$bar/ and print "$&\n";
/$foo/ and print "$&\n";
/\Q$foo\E/ and print "$\n";
Just prints one 'bar'. Take a look at perlre and perlop.
Cheers, Jeroen
"We are not alone"(FZ) | [reply] [d/l] [select] |
|
|
This is precisely why I made some test code to explore this,
quite similar, in fact:
$foo = '$bar';
$bar = 'snafu';
$_ = 'I am certain that the value of $bar is "snafu".';
print $_,"\n";
s/$foo/BAR/g;
print $_,"\n";
However, it doesn't interpolate '$bar' into anything
meaningful, and as such, 'snafu' does not get replaced as
one might surmise. | [reply] [d/l] |
|
|
Of course it dosen't. $ within a regex means End-Of-Line.
And trying to match something (nonempty) after the end of the line
is not successful within a single line match.
(But I have to admit, I had to run Perl for this
and then stare at the output for some time)
Update: I can't confirm jeroenes' findings with
Perl 5.003 under solaris. I only get one bar printed
and no substitution.
| [reply] |
|
|
|
|
|
|
But even better, than I tried some deliberate typos, to
check for funny things. And guess what, I found something
funny!
$_='Some string with $foo and bar';
$bar='bar';
$foo='$bar';
/$bar/ and print "$&\n";
/$foo/ and print "$&\n";
/\Q$fooE/ and print "$&\n";
s/\Q$fooE/XYZ/;
print "$_\n";
Prints bar twice, and replaces bar by XYZ! This is definitely
very strange... is this a bug or what?
Jeroen
"We are not alone"(FZ)
Update: This was run with perl5.6/linux:
"This is perl, v5.6.0 built for i386-linux "
(2) grinder I made this typo on purpose.
(3) Thx grinder, things are as they should be now :-) | [reply] [d/l] |
|
|
|
|
You are right, Perl's regexes do undergo variable interpolation but it's only done once, e.g. $bar becomes the value it holds and $foo becomes $bar which doesn't then become $bar's value.
The reason why neither of your bits of code work (assuming you are using the same $_ value for both) is because as $foo is interpolated to $bar the regex becomes s/$bar/BAR/g which is $ (the end of line) followed by the characters 'bar'.
I expect there is a way to fiddle with the end of line character and then do a s/$foo/BAR/m #treat string as multi-line (or maybe it's s/$foo/BAR/s # treat string as single-line - I can never remember) to get it to match your $_ if you took the $ out (ie. if the \b before 'bar' somehow became an end of line) but I'll have to open that one up as it's over my head. (where's japhy when we need him?)
The reason \Q$foo\E works is because after $foo is interpolated to $bar the \Q\E slaps a \ in front of the $ so it is treated literally rather than as the end-of-line marker.
You will find that your second lot of code will work if you do:
$foo = '\$bar'; # put the backslash in yourself
$bar = 'snafu';
$_ = 'I am certain that the value of $bar is "snafu".';
print $_,"\n";
s/$foo/BAR/g;
print $_,"\n";
Hope this helps, larryk | [reply] [d/l] [select] |
Re: Regex Grumblings (Variable Interpolation)
by merlyn (Sage) on May 23, 2001 at 18:26 UTC
|
My question is: Why is the regexp compiler lenient enough to recognize $bar straight out as a variable reference, but one level removed (via $foo) and it will not operate?
This is a Good Thing from a security perspective.
Suppose you had a place for me to type a regex on a web form. So it shows up
in a Perl variable, which you interpet dynamically as above. If I learn of that,
I merely enter $foo[`evil command`] for my regex, and I've
now haxored your system.
No, the present system must stay. It's the only way to ensure no "double-level"
of interpretation, an absolute requirement for security.
-- Randal L. Schwartz, Perl hacker
| [reply] [d/l] |
|
|
I'm probably missing the boat here, but the documentation
claims that since "patterns are processed as double-quoted
strings, the normal double-quoted interpolations will work."1
my $foo = "`ls`"; # "Evil" command
my $bar = "$foo";
print $bar,"\n";
All you get is:
`ls`
I wasn't hoping for a miracle to occur, just that $foo
would be translated as literal string '$bar', and
that the '$' would be recognized as just another ASCII
character, not the end of line anchor. After all, if we're
on the subject of evil, now this means that you can put
all sorts of wacky stuff in your variable and it gets
interpolated as regexp material, or at least jostles
your program with a warning:
my $foo = '(?{die})';
s/$foo/XYZ/g; # Eval-group not allowed at runtime, use re 'eval'
Maybe there should be a switch for regexps which cause any
interpolated strings to be interpreted as just text
and any meaning is disregarded. Of course, you can always
do this with \Q and \E...
1Programming Perl, 2nd Ed., pg. 60 | [reply] [d/l] [select] |
|
|
if we're on the subject of evil, now this
means that you can put all sorts of wacky stuff in your variable and it gets interpolated as regexp material,
Um, that's not evil, that's intentional. How else would you store a regular expression in a variable for later use? (Remember that qr// is only a recent addition to Perl.)
| [reply] |
Re: Regex Grumblings (Variable Interpolation)
by chipmunk (Parson) on May 23, 2001 at 19:27 UTC
|
$bar = 'snafu';
$foo = '$bar';
$_ = '$bar snafu';
s/$foo/XYZ/;
print;
I think that you are expecting one of two results from this code, but I'm not sure which...
- $bar XYZ, as $foo interpolates to $bar, then $bar interpolates to snafu.
- XYZ snafu, as $foo interpolates to $bar, and $bar matches literally.
Neither of those would be correct behavior.
- Double-quoted strings don't interpolated recursively; neither do regexes. $bar = 'snafu'; $foo = '$bar'; $_ = "$foo"; leaves $_ with the value '$bar', not 'snafu'.
- If the value of $foo were '.*', would that match any number of characters, or only the two literal characters period and asterisk? It's the former, of course, because . and * are metacharacters. $ is the same way; backslash it if you want to match a literal dollar sign.
The correct output is $bar snafu, because the substitution doesn't find a match for m'$bar' | [reply] [d/l] [select] |