Latro has asked for the wisdom of the Perl Monks concerning the following question:

Hi.

Been doing some work to parse some kinda-of-log files and in doing so I just got the idea that 5.10 named capture buffers in regex are exactly what I need to get a quick "give me a category for an error depending on which part of this regex matched" thing.

So got to work on it and ... weird results happened. See following code demonstrating it:

#!/usr/bin/perl -l use Getopt::Std; getopts( "fq", \%opt ); $line="ERROR 02052008-14:45 EDW08B50 ...ocal/log2000/data/incoming/bk/xp/bk005466.edi B: 1195043779 , RFT: AD0010043543 ERROR 02052008-14:45 EDW08B50 Closing date before today:2008-04-23"; $named_regex = '(?<DCSE>AD[^A]|NA|VN3)\S*'; $named_regex.= '$' if defined($opt{f}); $named_regex=qr/$named_regex/ if defined($opt{q}); $line=~/$named_regex/oms; print join("=>",%+);

Simply put, if you call this program without arguments, it goes and uses string in $name_regex on the match. If you run it with -q, it first pass that string through qr//. The -f controls if the regex ends with a $ or not.

Should give always the same result right? It doesnt:

# ./test.pl DCSE=>AD0 # ./test.pl -q DCSE=>AD0 # ./test.pl -f DCSE=>AD0 # ./test.pl -fq #

Somehow, that final $ on the strings means that, if $named_regex is just interpolated on the match, everything is ok, but if it is compiled as a regex first with qr//, then it isnt.

Any idea? From "it is evident that you are making a huge mistake, see..." from "better report it", anything is welcome

Replies are listed 'Best First'.
Re: Weird bug with qr// and named capture buffers? Or just me?
by almut (Canon) on Jun 17, 2008 at 17:13 UTC

    I think you (also) need to specify the regex options with the qr// - i.e. qr/$named_regex/ms - because in the case of using the precompiled regex in the subsequent match, options will be ignored (as in $line=~/$named_regex/oms).

Re: Weird bug with qr// and named capture buffers? Or just me?
by ikegami (Patriarch) on Jun 17, 2008 at 18:44 UTC

    If you don't specify it on the m//, s/// or qr// that builds a regexp, ^ and $ will act as if you didn't specify it. Absence of "m" does not mean "inherit". For that reason, qr// without "m" turns off the option for the regexp it builds.

    >perl -le"print qr/foo/" (?-xism:foo) >perl -le"print qr/foo/m" (?m-xis:foo)

    Until very recently, there was a bug that caused (?-m:...) to be ignored in some circumstances.

    >perl588 -le"$re=qr/a$/; print $re; print qq{a\nb\n}=~/$re/m ?1:0" (?-xism:foo$) 1 >perl5100 -le"$re=qr/a$/; print $re; print qq{a\nb\n}=~/$re/m ?1:0" (?-xism:foo$) 0

      Oh! I see, it makes sense. The perlop page says that if you use the options on the qr// and then interpolate the variable on a bigger string on the m// , the options (msi....) get set for the whole thing too.

      But it makes sense as you say, that if m wasnt specified when using the qr//, the resulting compiled regex will not "accept" it later at the m//

      So yes, it is my fault and trivial - even if a bit obscure, it should have dawned to me before. It is "compiled", after all :-) Thanks!

        I'll let the developers know about the bad docs.
      We just bumped into this in our codebase, and I've heard rumor that there's a patch somewhere to 5.10 that generates a warning if someone attempts to use a modifier on an already compiled regexp. Is this just a rumor?