in reply to variable interpolation in regexps

OK, I see I had better told you right from the beginning about the context of my problem in order to prevent you from having to guess what I might want to do, sorry!

So, first of all: I never even thought about using a variable as a variable's name - you don't need to worry about that ;-)

In fact, I willingly agreed to offer an introductory course to Perl for my colleagues some time ago and decided to follow the famous Llama Book. In order to play around with regexps Randal, Tom and brian were so kind to provide a little test program which you can download from the book's website. I thought it would be nice not to have to hardcode the particular regexp but to read it from STDIN instead. Therefore, I dared to slightly modify (mea culpa, I admit it) the original code to look like this:

#!/usr/bin/perl -w use strict; print 'Please enter the RE to test: '; chomp( my $regexp = <STDIN> ); print "Please enter your strings ('QUIT' to exit):\n"; print 'regex> '; while ( <STDIN> ) { chomp; last if /^QUIT$/; if ( /$regexp/ ) { print "Match: |$`<$&>$'|\n"; } else { print "No matches.\n"; } print 'regex> '; }

In this situation it seems to be kinda counterproductive to use quotemeta or qr{\Q...\E} on $regexp because input like

(fred|barney){3}

is transformed into something that prints as

  \(fred\|barney\)\{3\} or   (?-xism:\(fred\|barney\)\{3\}),

respectively, and doesn't match strings which are expected to be matched, e.g. fredfredbarney.

I am fully aware that my modified test program runs into trouble if the user enters a syntactically incorrect regexp. However, - as I said already - I would have expected that if I enter some$thing as RE to test there would be a match on the string some$thing. And since I realized that there isn't I am trying to find a string which is matched in this special test case - or to find out why there exists no such string.

Although I learnt a lot from Corion's and Firefly258's answers (just like cdarke I took it for granted that $ stands for EOL only if occurring at the end of a regular expression) you both only told me how to re-define $regexp in order to match with this or that string - because I failed to inform you about the context (I really should have known better since context is nuts-and-bolts in Perl...). However, I tend to conclude from Corion's response to cdarke's posting that there's no chance to read in a string from the default configured STDIN which is matched by a previously entered "RE" some$thing as such strings are always terminated by an EOL, right?

So let's forget about strings coming from STDIN and let me state the (at least from my poor Initiate point of view) still open question: What value do I have to set $_ to in line 4 of the following little program in order to make it print to STDOUT?

01 #!/usr/bin/perl -w 02 use strict; 03 my $regexp = 'some$thing'; 04 $_ = # PLEASE ENTER YOUR TEST STRING HERE 05 print "That's it!\n" if /$regexp/m;

Neither "some\nthing" nor 'some$thing' nor 'some' do the trick.

P.S. to Corion: If I used qr{\Q...\E} instead of single quotes in line 3 and defined another scalar

  my $thing = 'THING';

then 'someTHING' solves the problem. That is what I meant when I wrote "double interpolation by means of qr{\Q...\E}".

Replies are listed 'Best First'.
Re^2: variable interpolation in regexps
by Firefly258 (Beadle) on Nov 25, 2006 at 23:13 UTC
    Well, the simple fact is nothing can be put into $_ to get the regexp some$thing to match against it because $ is more of a placeholder metacharacter to indicate to the regex engine to make matches around EOL, it doesn't actually denote or match any character per se.

    Under the popular text formats an EOL is denoted by a Carriage Return or Line Feed or both. So, is it possible to try and match an EOL sequence in a string that doesn't contain either of CR or LF or CR/LF (\r, \n, \r\n respectively)? Well, No, it's simply impossible.

    You have to change your regexp to try and match an actual EOL sequence after $ like this.
    my $regexp = qr/some$.thing/sm; # or even qr'some$\nthing'm local $_ = "some\nthing"; print " matched $& " if /$regexp/;
    You'll notice the use of qr//sm modifiers to get the regexp to work with multiline strings and get . to match newlines. The //sm modifiers are very important as normally, newlines aren't matched by . in regular expressions unless /s is used but we also are working with multiline strings, hence //sm, more info in perlretut.

    Not all strings entered from STDIN are newline terminated, e.g. if you were in multi-line mode (via $/ = undef ) and CTRL+D (twice if preceding character wasn't a newline) was used to terminate input, no newline or any character for that matter is appended to the end of the input string.


    perl -e '$,=$",$_=(split/\W/,$^X)[y[eval]]]+--$_],print+just,another,split,hack'er
      Aaaah, what a light bulb moment - thank you, Firefly258! I was a real blockhead not to see that as an anchor $ can only match a position between two characters or "between" a character and the beginning/end of a string, just as it is with the anchors \b and \B.

      I must confess I'm still not very familiar with the qr// operator (but be assured that perlretut has been pushed onto my 2do stack ;-) so I tried

      my $regexp = 'some$' . "\nthing";

      and, finally, /$regexp/m matched "some\nthing".

      Thanks again. You Monks are great!