weird regex problem

toadi has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: weird regex problem by davorg (Chancellor) on Jun 14, 2001 at 13:32 UTC
Well there are quite a few typos in your code which mean that it doesn't compile, but I assume they are just transcription errors. Fixing them and running your code, I see that nothing is output as you describe. The problem is in the line `if ( $contents =~ /(line2-)(\w)/g ) {` The `/g` is unnecessary here and is causing the expression to evaluate as false. Removing the `/g` makes the code work as expected. The `/g` option matches the regex as often as it can against your string. The final time it tries to match, the match fails and the operator returns false. Without the /g the match only takes place the one time it needs to succeed and the operator returns true. Update:* Yeah. As others in this thread point out, I had the right fix, but the wrong explanation. Should drink more coffee before posting :) -- <http://www.dave.org.uk> Perl Training in the UK <http://www.iterative-software.com>	[reply] [d/l]
Re: Re: weird regex problem by Hofmator (Curate) on Jun 14, 2001 at 15:57 UTC
Right fix davorg, but wrong explanation :). Here is the relevant code section again: `if ( $contents =~ /line3/g ) { if ( $contents =~ /(line2-)(\w)/g ) { print $2; } }` [download] In $contents is the slurped file: line1-11 line2-12 line3-13 The first if matches 'line3' and returns true. The second match picks up at the position where the first match left (because it has also the /g modifier) and fails (!) because line2 is before line3. So removing the /g modifier on the second 'if' solves the problem as the match is now done from the start of $contents. As a matter of fact, the /g modifier can be left out for both matches. Some further optimisations I would suggest for this regex: Leave out the capturing brackets for (line2-) Use multiline matching (/m) Compile the patterns only once (/o) This then leads to the following code: `if ( $contents =~ /^line3/mo ) { if ( $contents =~ /^line2-(\w)/mo ) { print $1; # has to be changed as well } }` [download] To clarify the /g modifier a little bit further let's take a look at this code (of course see also perlre and perlop): `my $string = "abcde abcde adcde"; while ($string =~ /cd/og) { print "pos = ", pos $string, "\n"; if ($string =~ /a(.)/ocg) { print "a$1 matched at ",pos $string, "\n"; } }` [download] Here the first match happens in a while loop, but important still in scalar context. The inner matching starts at the position where the first left off as it also has the /g modifier. Then the outer match takes its turn again starting where the inner left off. The position in the string is only reset when a match fails. This does not happen when the /c modifier is given. This is necessary for the second match in this case - otherwise there is an infinite loop. Taking this code and playing a bit with the modifiers and the string helps a lot in understanding these (not so easy) things. And I haven't even started talking about m//g in list context yet ... -- Hofmator	[reply] [d/l] [select]
Re: weird regex problem by japhy (Canon) on Jun 14, 2001 at 16:02 UTC
japhy at YAPC is still vigilant enough to find and answer your regex questions `;)` The problem is, as davorg pointed out, the `/g` modifier -- but I don't think he pointed it out the right way. If you match on a string in scalar context with `m//g`, then the next `m//g` match on that string (assuming you've not futzed with `pos()` or modified the string or such) will start looking where the last one left off. That means that this code: `$_ = "c a"; if (/a/g/ and /c/g) { ... }` [download] Won't match, since after the "a" is matched, the next regex starts AFTER the "c". Removing the `/g` will make things work as expected. `japhy` -- Perl and Regex Hacker	[reply] [d/l]
Re: weird regex problem by dimmesdale (Friar) on Jun 14, 2001 at 22:10 UTC
The right solution has been offered. . . but I'm a little puzzled: `#THIS LINE BELOW! if ( $contents =~ /^line2-(\w)/mo ) { print $1; # has to be changed as well } }` [download] Why do you use \w? First, if all you have is 2-11, 2-12, etc., then a \d is by far better. Second, by having the * you allow this line to be matched: line2- I assume you do not wish that to happen, so add \d+ (unless you really need the \w). Also, as to Hofmator's optimization, I disagree. The /o will have NO effect, since the variable is not in the regex, but is bound to it. If it were $contents =~ /$line/ then it would be an optimization, but now it has no effect, and only muddies up the waters, so to say. Also, I would speculate that the /m is useless in this situation. The format is: line1: /stuff here/ line2-2: /more stuff/ . . . Therefore, I assume by what he gave us, that the information appears only at the beginning of the lines, and would not make sense to spread multiple lines; therefore taking away the /m would be better(not to mention /m doesn't optimize it, but slightly detracts from it). UPDATE: By the way, to optimize it further, you should add the caret to the regex; /^line1/ if I am correct in that line1 is at the beginning of the line.	[reply] [d/l]