tlhackque has asked for the wisdom of the Perl Monks concerning the following question:

When using /m with a regexp, $ becomes particularly ambiguous; it means both 'end of line' and 'interpolate a variable'. (Although this is true without /m, it causes me grief much less often in that case.) I've never seen the rules documented for how Perl disambiguates. I've looked. What I find is that they're explained as "magic'; 'Perl usually does what you mean.'

What are the rules? And what's the best idiom for influencing the outcome?

For example, consider /foo$.*^bar/ms - is this $. (the variable) or EOL followed by random stuff?

I find myself introducing non-capturing parens, as it seems that if a paren can be matched it won't be used as a variable name.

A more realistic example:
$QQ->get_private_key_string =~ /(^-----BEGIN (?:(RSA) )?PRIVATE KEY-----$(?:.*)^-----END (?:RSA )?PRIVATE KEY-----$(?:\s*))/ms;

This gets quite entertaining, considering that $. $( and $) are variables.

It would be nice to have a deterministic way to code these cases. And may I suggest that this should be documented?

Thanks.

This communication may not represent my employer's views, if any, on the matters discussed.

Replies are listed 'Best First'.
Re: Disambuating $ in (especially /m) regexps
by Eily (Monsignor) on Jan 05, 2016 at 12:44 UTC

    There's also the option of using /x, having a space after the $ will remove the ambiguity for the compiler and anyone reading the code.

      I like that unconventional use of /x. Thanks.
        You must be careful not to match lines that are not consecutive.
        use strict; use warnings; my $string = "foo LAST\n" ." blftz\n" ."LINE fum\n" ; print "MATCH\n" if ($string =~ m/LAST$ .* ^LINE/xms) ;

        You could solve this with:

        print "MATCH\n" if ($string =~ m/LAST$ ..? ^LINE/xms) ;
        Bill
Re: Disambuating $ in (especially /m) regexps
by choroba (Cardinal) on Jan 05, 2016 at 13:14 UTC
    Does it make any sense to match the character after $? It must be a newline. Similarly, the character before ^ must be a newline. So, use \n, and use the :crlf layer if needed to avoid the need to handle \r.

    But I'd be interested to know the rules for parsing the regexes, too.

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re: Disambuating $ in (especially /m) regexps
by Your Mother (Archbishop) on Jan 05, 2016 at 12:26 UTC

    FWIW, there are also \z and \Z available. I agree with anonymonk that some version of \r?\n is better than trying to use $ everywhere.

      Thanks. I know I can use explicit \n in (\z\Z are end of string, not useful in the middle /m cases). \n is a character; $ is a position assertion true before it. I usually want the position. And by the time one deals with "or end of string", it's a lot of characters in the regex -- but not a lot of clarity. e.g. (?:  )-the-position-[\n\z]? (leaving aside the \r, which $ doesn't handle)

      The doc for /m says Let ^ and $ match next to embedded \n.

      So while TMTOWTDI, I'd still like to understand how $ gets disambiguated, and hopefully some simple rule for keeping it straight.

Re: Disambuating $ in (especially /m) regexps
by Anonymous Monk on Jan 05, 2016 at 16:26 UTC
    As per perlop:
    Interpolation in patterns has several quirks: $|, $(, $), "@+" and "@-" are not interpolated, and constructs $var[SOMETHING] are voted (by several different estimators) to be either an array element or $var followed by an RE alternative.
    It's in "Gory details of parsing quoted constructs". Therefore, $|, $( and $) are not variables in re patterns. Also note that
    If "'" (single quote) is the delimiter, no interpolation is performed on the PATTERN.
Re: Disambuating $ in (especially /m) regexps
by Anonymous Monk on Jan 05, 2016 at 11:29 UTC

    Why do you think you need $ at all? especially when combined with /s?

    Just match the line ending if you want it, ...[\r\n]+...

Re: Disambuating $ in (especially /m) regexps
by ikegami (Patriarch) on Jan 06, 2016 at 18:41 UTC

    /foo$.*^bar/ms - is this $. (the variable) or EOL followed by random stuff?

    $ perl -E'say qr/foo$.*^bar/ms' (?^ums:foo*^bar)
    It's the variable. Fixed:
    $ perl -E'say qr/foo(?:$).*^bar/ms' (?^ums:foo(?:$).*^bar)