Locutus has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,
from what I've thought I knew about Perl's rules for variable interpolation I would have expected that after setting

  my $regexp = 'some$thing';

the condition /$regexp/ is true if $_ contains the string 'some$thing' (because by default only one level of interpolation is performed, and thus /$regexp/ should be equivalent to /some\$thing/) - but it isn't!

I also defined

  my $thing = 'THING';

and then tried if /$regexp/ was true for any $_ in ( 'someTHING', 'some', "some\n" ) but it still kept being false.

Is there any value for $_ at all such that /$regexp/ (with $regexp defined as above) becomes true, and why / why not?

Humble greetings,
Locutus

Replies are listed 'Best First'.
Re: variable interpolation in regexps
by Corion (Patriarch) on Nov 24, 2006 at 13:22 UTC

    You're running into regular expression meta characters, like $. You want to look at perlre and use \Q...\E or quotemeta for interpreting your string as literal characters in regular expression interpolation:

    my $regexp = qr/\Qsome$thing\E/; ... if (/$regexp/) {

    or alternatively

    my $regexp = quotemeta('some$thing'); ... if (/$regexp/) {
      With respect I don't think it is seeing $ as an RE meta-character, but interpolation is looking for a variable called $thing. My understanding is that $ only means 'end-of-text' at the end of an RE expression, unless /m is specified.
      Using \Q will still fix it though.

        I now realize that I only saw half of the problem. The wanted idea seems to be double-interpolation of 'some$thing' into "someTHING" by replacing the string '$thing' within $regexp with the value of the variable $thing.

        Your points are close but wrong - Perl allows stuff to appear in regular expressions after the $ metacharacter. Otherwise, Perl regular expressions act like double-quoted strings and hence interpolate only once:

        #!perl -wl use strict; my $regexp = 'some$thing'; my $thing = 'THING'; my $target = 'this is some$thing strange.'; print $regexp; print $target; print '$target =~ /$regexp/ ', $target =~ /$regexp/; print '$target =~ /\Q$regexp\E/ ', $target =~ /\Q$regexp\E/; print '$target =~ /some\$thing/ ', $target =~ /some\$thing/; print '$target =~ /some$/ ', $target =~ /some$/; my $eol_in_the_middle = qr/some$(?:thing)/; print 'eol_in_the_middle ',$eol_in_the_middle; print '$target =~ /$eol_in_the_middle/ ',$target =~ /$eol_in_the_middl +e/;

        Of course, it's kinda hard to make $eol_in_the_middle match, but you can do it by using the /m switch and changing the RE and target string a bit:

        my $eol_in_the_middle2 = qr/some$(?:\s*thing)/; my $target2 = "this is some\nthing strange."; print '$target2 =~ /$eol_in_the_middle2/ ',$target2 =~ /$eol_in_the_mi +ddle2/sm;
      Thanks, Corion, for
      - responding so quickly,
      - pointing me to the relevant documentation, and
      - providing the two code snippets which do the one (double interpolation by means of qr{\Q...\E}) or the other trick (single interpolation by means of quotemeta).

      I'm a little astonished that obviously none of these two is the default behaviour of Perl. May I ask you (or any of the other Monks reading) to explain why exactly /$regexp/ is always false in my examples? I would really like to understand what's going on "inside" if I use neither qr{\Q...\E} nor quotemeta, i.e. for which value of $_ the condition would be true.

        I first didn't understand what you want to do, but cdarke pointed me to it.

        It seems you want to solve two problems:

        Firstly, you want to use the value in $regexp as some kind of template which later on uses the value of $thing to replace the string $thing within some$thing. This does not happen, because regular expressions are, to Perl, more or less like double-quoted strings (see perlop, section Quotes and quote-like operators). So, your $regexp will always remain some$thing and never access the value of the variable named $thing.

        \Q...\E don't do double interpolation, they do quoting, so the regex meta characters are not seen as meta characters anymore and thus don't help you with your problem.

        Having double interpolation the default in Perl would be horrendous, because after double interpolation immediately follows triple interpolation whenever you're using a double-interpolated string in any string context, which would make it nearly impossible to use any modules that construct strings or regular expressions.

        To solve your problem at hand, doing template-like replacement on a string, you don't want to use a variable (and its name), but a hash which contains the name/value pairs:

        my $regexp = 'some$thing'; my %values = ( 'thing' => 'THING', # ... other value mappings ); print $regexp; $regexp =~ s/\$(\w+)/$values{$1}/ge; print $regexp;

        If you're still wondering, yes, it's possible to do the replacement on variables by name too, but Dominus has explained far much better why it's stupid to use a variable as a variable name.

        To address your last question, the regular expression formed by $regexp = qr/some\$thing/ matches any string that contains the substring some$thing literally, that is, with a dollar sign in the middle.

Re: variable interpolation in regexps
by Firefly258 (Beadle) on Nov 24, 2006 at 18:17 UTC
    There is no variable interpolation in my $regexp = 'some$thing';. The content of $regex is now a literal string some$thing, since you used single quotes.

    The literal string some$thing used as a regex is not equivalent to /some\$thing/, because $ is a interpreted as a regex metacharater denoting EOL. It is equivalent to /some$thing/ though, I hope the difference is obvious now.

    If you wanted to interpolate $thing into $regexp, you would need to use double quotes not single quotes.
    my $thing = 'THING'; my $regexp = "some$thing"; # now $regexp is 'someTHING'
    Maybe the following code will illustrate it all a little better.
    #!/usr/bin/perl -W use strict; my $thing1 = 'THING'; my $thing2 = '$thing'; sub test { my $regex = shift; local $_ = 'some thing someTHING some$thing'; printf " %15s =~ %s", $regex, $_; print " == TRUE , matched [", $&, "]" if /$regex/; print "\n"; } test ( 'some$thing1' ); test ( "some$thing1" ); test ( 'some$thing2' ); test ( "some$thing2" ); test ( quotemeta 'some$thing1' ); test ( quotemeta "some$thing1" ); test ( quotemeta 'some$thing2' ); test ( quotemeta "some$thing2" );
Re: variable interpolation in regexps
by Locutus (Beadle) on Nov 25, 2006 at 20:12 UTC
    OK, I see I had better told you right from the beginning about the context of my problem in order to prevent you from having to guess what I might want to do, sorry!

    So, first of all: I never even thought about using a variable as a variable's name - you don't need to worry about that ;-)

    In fact, I willingly agreed to offer an introductory course to Perl for my colleagues some time ago and decided to follow the famous Llama Book. In order to play around with regexps Randal, Tom and brian were so kind to provide a little test program which you can download from the book's website. I thought it would be nice not to have to hardcode the particular regexp but to read it from STDIN instead. Therefore, I dared to slightly modify (mea culpa, I admit it) the original code to look like this:

    #!/usr/bin/perl -w use strict; print 'Please enter the RE to test: '; chomp( my $regexp = <STDIN> ); print "Please enter your strings ('QUIT' to exit):\n"; print 'regex> '; while ( <STDIN> ) { chomp; last if /^QUIT$/; if ( /$regexp/ ) { print "Match: |$`<$&>$'|\n"; } else { print "No matches.\n"; } print 'regex> '; }

    In this situation it seems to be kinda counterproductive to use quotemeta or qr{\Q...\E} on $regexp because input like

    (fred|barney){3}

    is transformed into something that prints as

      \(fred\|barney\)\{3\} or   (?-xism:\(fred\|barney\)\{3\}),

    respectively, and doesn't match strings which are expected to be matched, e.g. fredfredbarney.

    I am fully aware that my modified test program runs into trouble if the user enters a syntactically incorrect regexp. However, - as I said already - I would have expected that if I enter some$thing as RE to test there would be a match on the string some$thing. And since I realized that there isn't I am trying to find a string which is matched in this special test case - or to find out why there exists no such string.

    Although I learnt a lot from Corion's and Firefly258's answers (just like cdarke I took it for granted that $ stands for EOL only if occurring at the end of a regular expression) you both only told me how to re-define $regexp in order to match with this or that string - because I failed to inform you about the context (I really should have known better since context is nuts-and-bolts in Perl...). However, I tend to conclude from Corion's response to cdarke's posting that there's no chance to read in a string from the default configured STDIN which is matched by a previously entered "RE" some$thing as such strings are always terminated by an EOL, right?

    So let's forget about strings coming from STDIN and let me state the (at least from my poor Initiate point of view) still open question: What value do I have to set $_ to in line 4 of the following little program in order to make it print to STDOUT?

    01 #!/usr/bin/perl -w 02 use strict; 03 my $regexp = 'some$thing'; 04 $_ = # PLEASE ENTER YOUR TEST STRING HERE 05 print "That's it!\n" if /$regexp/m;

    Neither "some\nthing" nor 'some$thing' nor 'some' do the trick.

    P.S. to Corion: If I used qr{\Q...\E} instead of single quotes in line 3 and defined another scalar

      my $thing = 'THING';

    then 'someTHING' solves the problem. That is what I meant when I wrote "double interpolation by means of qr{\Q...\E}".

      Well, the simple fact is nothing can be put into $_ to get the regexp some$thing to match against it because $ is more of a placeholder metacharacter to indicate to the regex engine to make matches around EOL, it doesn't actually denote or match any character per se.

      Under the popular text formats an EOL is denoted by a Carriage Return or Line Feed or both. So, is it possible to try and match an EOL sequence in a string that doesn't contain either of CR or LF or CR/LF (\r, \n, \r\n respectively)? Well, No, it's simply impossible.

      You have to change your regexp to try and match an actual EOL sequence after $ like this.
      my $regexp = qr/some$.thing/sm; # or even qr'some$\nthing'm local $_ = "some\nthing"; print " matched $& " if /$regexp/;
      You'll notice the use of qr//sm modifiers to get the regexp to work with multiline strings and get . to match newlines. The //sm modifiers are very important as normally, newlines aren't matched by . in regular expressions unless /s is used but we also are working with multiline strings, hence //sm, more info in perlretut.

      Not all strings entered from STDIN are newline terminated, e.g. if you were in multi-line mode (via $/ = undef ) and CTRL+D (twice if preceding character wasn't a newline) was used to terminate input, no newline or any character for that matter is appended to the end of the input string.


      perl -e '$,=$",$_=(split/\W/,$^X)[y[eval]]]+--$_],print+just,another,split,hack'er
        Aaaah, what a light bulb moment - thank you, Firefly258! I was a real blockhead not to see that as an anchor $ can only match a position between two characters or "between" a character and the beginning/end of a string, just as it is with the anchors \b and \B.

        I must confess I'm still not very familiar with the qr// operator (but be assured that perlretut has been pushed onto my 2do stack ;-) so I tried

        my $regexp = 'some$' . "\nthing";

        and, finally, /$regexp/m matched "some\nthing".

        Thanks again. You Monks are great!