in reply to When exactly do Perl regex's require a full match on a string?

Zero-width assertions like  ^, $, \A, \Z, \z, \b, \B and \G match at a position in a string, they do not match a character.   So  $ and \Z will normally match at the position before the newline at the end of the string unless a) there is no newline at the end of the string, or b) the pattern before the assertion would also match a newline.

$ perl -e' use Data::Dumper; $Data::Dumper::Useqq = 1; for ( "ab\ncd", "ab\ncd\n" ) { /\w*$/ && print Dumper $&; /\w*\Z/ && print Dumper $&; /.*$/ && print Dumper $&; /.*\Z/ && print Dumper $&; /.*$/m && print Dumper $&; /.*\Z/m && print Dumper $&; /.*$/s && print Dumper $&; /.*\Z/s && print Dumper $&; print "\n"; } ' $VAR1 = "cd"; $VAR1 = "cd"; $VAR1 = "cd"; $VAR1 = "cd"; $VAR1 = "ab"; $VAR1 = "cd"; $VAR1 = "ab\ncd"; $VAR1 = "ab\ncd"; $VAR1 = "cd"; $VAR1 = "cd"; $VAR1 = "cd"; $VAR1 = "cd"; $VAR1 = "ab"; $VAR1 = "cd"; $VAR1 = "ab\ncd\n"; $VAR1 = "ab\ncd\n";

Also, you are using the /s modifier which only effects whether the . metacharacter will match a newline or not, and you are not using the . metacharacter in your patterns.

Replies are listed 'Best First'.
Re^2: When exactly do Perl regex's require a full match on a string?
by ELISHEVA (Prior) on Feb 08, 2009 at 17:35 UTC
    So $ and \Z will normally match at the position before the newline at the end of the string unless a) there is no newline at the end of the string, or b) the pattern before the assertion would also match a newline.

    Zero-widthness was also my starting point, but it is exactly what raised the question I asked. My sense is that Oshalla has it right when he says "absent the m modifier, $ matches end-of-string or just before a newline at end-of-string".

    As the examples below show, absent the m modifier, '$' does not match [before] an internal new line, but it is perfectly happy matching [before] a final newline after an internal newline:

    string=<\n\n> no modifier: regex=/$\n\n/ no match => $ needs m modifier to match internal nl m modifier (multi line mode): regex=/$\n\n/m match => $ needs m modifier to match internal nl string=<\n\n> no modifier: regex=/\n$\n/ match => $ matches final nl after internal nl m modifier (multi line mode): regex=/\n$\n/m match => $ matches final nl after internal nl string=<\n\n> no modifier: regex=/\n\n/ match => m modifier (multi line mode): regex=/\n\n/m match =>

    Best, beth

    Update: added [before] to make it clearer that the zero-widthness of '$' wasn't at issue, but rather which newline was being matched by the zero-width '$' - thanks jwkrahn for pointing out that it wasn't clear that was meant.

      A regex like /\n$\n/ doesn't really make any sense since matching after the end of the string is like asking for the 11th value of a 10 value array. So whether it matches "\n\n" is quite academic. I could live with a perl that does not have a consistent answer for this. The important cases IMO are:

      > perl -e ' $_="a\n"; print "match\n" if (m/^a\n$/); ' match > perl -e ' $_="a\n"; print "match\n" if (m/^a$/); ' match

      Which means you can match the \n if you want, but you don't need to.

      Here's the problem: With or without the /m modifier, the regex  /\n$\n/ does not match against the  "\n\n" string!
      >perl -wMstrict -le "my $s = qq{\n\n}; print $s =~ /(\n$\n)/ ? qq{:$1:} : 'no match'; " no match >perl -wMstrict -le "my $s = qq{\n\n}; print $s =~ /(\n$\n)/m ? qq{:$1:} : 'no match'; " no match
      The reason is that the  $\ sequence in the regex is taken as the  $\ 'output record separator' Perl special variable (a newline by default) and interpolated as such in the regex, which thus becomes equivalent to  / \n \n n /x (note the /x modifier).

      If the regex is disambiguated as  / \n $ \n /x (again, note the /x modifier), the regex matches both with and without the /m modifier.

      >perl -wMstrict -le "my $s = qq{\n\n}; print $s =~ /( \n $ \n )/x ? qq{:$1:} : 'no match'; " : : >perl -wMstrict -le "my $s = qq{\n\n}; print $s =~ /( \n $ \n )/xm ? qq{:$1:} : 'no match'; " : :
      In many of the examples in other replies in this thread, the ambiguity of  $\ in a regex that arises from interpolation is not taken into account and causes (or can cause) confoosion.

      Update: Consider the following misleading output from the OP:

      string=<a\n> no modifier: regex=/^a$\n/ match => $ matches only boundary, \n matches newline [ ... ] m modifier (multi line mode): regex=/^a$\n/m match => $ matches only boundary, \n matches newline
      In fact, neither regex matches:
      >perl -wMstrict -le "my $s = qq{a\n}; print $s =~ /^a$\n/ ? ' ' : 'NO ', 'match'; print $s =~ /^a$\n/m ? ' ' : 'NO ', 'match'; " NO match NO match
      The reason for the confusion is that the regex is first defined as  '^a$\n' (i.e., within non-interpolating single-quotes) in the test code, then interpolated within the actual  // regex operator, in which case the  $\ sequence is not ultimately interpolated as the output record separator string.

      Again, after appropriate disambiguation, everything's fine:

      >perl -wMstrict -le "my $s = qq{a\n}; print $s =~ /^ a $ \n/x ? ' ' : 'NO ', 'match'; print $s =~ /^ a $ \n/xm ? ' ' : 'NO ', 'match'; " match match
      As the examples below show, absent the m modifier, '$' does not match an internal new line, but it is perfectly happy matching a final newline after an internal newline:

      Therein may lie your problem.   A newline is a character and  $ is a zero-width assertion which will never match a character.    :-)