tadman has asked for the wisdom of the Perl Monks concerning the following question:

Mastering regular expressions is a perpetual goal, as just as you think you're on top of things, you discover something new. In the process of diagnosing a problem, I've encountered a behaviour that is a little peculiar. s/$bar/XYZ/g; Is not equivalent to:
$foo = '$bar'; s/$foo/XYZ/g;
As the former will substitite properly, and the latter will not. However, if you collar it, things will work out:
$foo = '$bar'; s/\Q$foo\E/XYZ/g;
My question is: Why is the regexp compiler lenient enough to recognize $bar straight out as a variable reference, but one level removed (via $foo) and it will not operate? This seems like a double standard. The documentation I have read indicates that standard double-quote interpolation occurs, which would seem to suggest that $foo is interpolated into the string '$bar', and then processed accordingly.

Replies are listed 'Best First'.
Re: Regex Grumblings (Variable Interpolation)
by jeroenes (Priest) on May 23, 2001 at 15:20 UTC
    Try the same with $foo="$bar";. The thing is, single quotes will get the characters literally ($,b,a,r) but doubles will interpolate (w,h,a,t,e,v,e,r,_,b,a,r,_,w,a,s).

    A regex reads a '$' as the end of the string or a variable to interpolate. However, with $foo you just get the literal ($,b,a,r) back, no matter if you use \Q or not. So:

    $_='Some string with $foo and bar'; $bar='bar'; $foo='$bar'; /$bar/ and print "$&\n"; /$foo/ and print "$&\n"; /\Q$foo\E/ and print "$\n";
    Just prints one 'bar'. Take a look at perlre and perlop.

    Cheers,

    Jeroen
    "We are not alone"(FZ)

      This is precisely why I made some test code to explore this, quite similar, in fact:
      $foo = '$bar'; $bar = 'snafu'; $_ = 'I am certain that the value of $bar is "snafu".'; print $_,"\n"; s/$foo/BAR/g; print $_,"\n";
      However, it doesn't interpolate '$bar' into anything meaningful, and as such, 'snafu' does not get replaced as one might surmise.

        Of course it dosen't. $ within a regex means End-Of-Line. And trying to match something (nonempty) after the end of the line is not successful within a single line match.

        (But I have to admit, I had to run Perl for this and then stare at the output for some time)

        Update: I can't confirm jeroenes' findings with Perl 5.003 under solaris. I only get one bar printed and no substitution.

        But even better, than I tried some deliberate typos, to check for funny things. And guess what, I found something funny!
        $_='Some string with $foo and bar'; $bar='bar'; $foo='$bar'; /$bar/ and print "$&\n"; /$foo/ and print "$&\n"; /\Q$fooE/ and print "$&\n"; s/\Q$fooE/XYZ/; print "$_\n";
        Prints bar twice, and replaces bar by XYZ! This is definitely very strange... is this a bug or what?

        Jeroen
        "We are not alone"(FZ)
        Update: This was run with perl5.6/linux: "This is perl, v5.6.0 built for i386-linux "
        (2) grinder I made this typo on purpose.
        (3) Thx grinder, things are as they should be now :-)

        You are right, Perl's regexes do undergo variable interpolation but it's only done once, e.g. $bar becomes the value it holds and $foo becomes $bar which doesn't then become $bar's value.

        The reason why neither of your bits of code work (assuming you are using the same $_ value for both) is because as $foo is interpolated to $bar the regex becomes s/$bar/BAR/g which is $ (the end of line) followed by the characters 'bar'.

        I expect there is a way to fiddle with the end of line character and then do a s/$foo/BAR/m #treat string as multi-line (or maybe it's s/$foo/BAR/s # treat string as single-line - I can never remember) to get it to match your $_ if you took the $ out (ie. if the \b before 'bar' somehow became an end of line) but I'll have to open that one up as it's over my head. (where's japhy when we need him?)

        The reason \Q$foo\E works is because after $foo is interpolated to $bar the \Q\E slaps a \ in front of the $ so it is treated literally rather than as the end-of-line marker.

        You will find that your second lot of code will work if you do:

        $foo = '\$bar'; # put the backslash in yourself $bar = 'snafu'; $_ = 'I am certain that the value of $bar is "snafu".'; print $_,"\n"; s/$foo/BAR/g; print $_,"\n";
        Hope this helps, larryk
Re: Regex Grumblings (Variable Interpolation)
by merlyn (Sage) on May 23, 2001 at 18:26 UTC
    My question is: Why is the regexp compiler lenient enough to recognize $bar straight out as a variable reference, but one level removed (via $foo) and it will not operate?
    This is a Good Thing from a security perspective.

    Suppose you had a place for me to type a regex on a web form. So it shows up in a Perl variable, which you interpet dynamically as above. If I learn of that, I merely enter $foo[`evil command`] for my regex, and I've now haxored your system.

    No, the present system must stay. It's the only way to ensure no "double-level" of interpretation, an absolute requirement for security.

    -- Randal L. Schwartz, Perl hacker

      I'm probably missing the boat here, but the documentation claims that since "patterns are processed as double-quoted strings, the normal double-quoted interpolations will work."1
      my $foo = "`ls`"; # "Evil" command my $bar = "$foo"; print $bar,"\n";
      All you get is:     `ls` I wasn't hoping for a miracle to occur, just that $foo would be translated as literal string '$bar', and that the '$' would be recognized as just another ASCII character, not the end of line anchor. After all, if we're on the subject of evil, now this means that you can put all sorts of wacky stuff in your variable and it gets interpolated as regexp material, or at least jostles your program with a warning:
      my $foo = '(?{die})'; s/$foo/XYZ/g; # Eval-group not allowed at runtime, use re 'eval'
      Maybe there should be a switch for regexps which cause any interpolated strings to be interpreted as just text and any meaning is disregarded. Of course, you can always do this with \Q and \E...
      1Programming Perl, 2nd Ed., pg. 60
        if we're on the subject of evil, now this means that you can put all sorts of wacky stuff in your variable and it gets interpolated as regexp material,
        Um, that's not evil, that's intentional. How else would you store a regular expression in a variable for later use? (Remember that qr// is only a recent addition to Perl.)
Re: Regex Grumblings (Variable Interpolation)
by chipmunk (Parson) on May 23, 2001 at 19:27 UTC
    Consider this snippet:
    $bar = 'snafu'; $foo = '$bar'; $_ = '$bar snafu'; s/$foo/XYZ/; print;
    I think that you are expecting one of two results from this code, but I'm not sure which...

    • $bar XYZ, as $foo interpolates to $bar, then $bar interpolates to snafu.
    • XYZ snafu, as $foo interpolates to $bar, and $bar matches literally.

    Neither of those would be correct behavior.

    • Double-quoted strings don't interpolated recursively; neither do regexes. $bar = 'snafu'; $foo = '$bar'; $_ = "$foo"; leaves $_ with the value '$bar', not 'snafu'.
    • If the value of $foo were '.*', would that match any number of characters, or only the two literal characters period and asterisk? It's the former, of course, because . and * are metacharacters. $ is the same way; backslash it if you want to match a literal dollar sign.

    The correct output is $bar snafu, because the substitution doesn't find a match for m'$bar'