in reply to qr() match order with multiple patterns

The problem is that for a given line with matches for both $MMDDYYYY and $DDMM the pattern for $DDMM is reported even though the text that matches $MMDDYYYY is first in the line.
Drop the /g modifier in your match condition and you should get the desired results (i.e $MMDDYYYY should match first).
HTH

_________
broquaint

  • Comment on Re: qr() match order with multiple patterns

Replies are listed 'Best First'.
Re: Re: qr() match order with multiple patterns
by gnu@perl (Pilgrim) on Jul 23, 2003 at 13:51 UTC
    Whoops! Kind of a typo, I was trying something different and forgot about that. I just went and changed it but I am still having the problem.

    Here are two example lines I am examining:

    F111406585_D072203_B085_E087_T047-P085_FCC_07222003_539.cdr USL_111406585_P085_A87_030723
    The first line, the date is 07222003, but it reports the 1114 that is found just after the 'F' in the beginning of the line. If in my code I take out the last $MMDD I then get '1406' returned for both lines. If I now take out $DDMM as well I then get 072203 for line 1 and a statement that there is no match for line 2. Now I take out $MMDDYY and I finally get 07222003 for line one and a statemant that line 2 does not match.

    It appears as if the match is happening from right to left, but that just didn't make sense to me.

      The first line, the date is 07222003, but it reports the 1114 that is found just after the 'F' in the beginning of the line
      Correct, as that matches $MMDD. This is because the alternation tries to match at every point of the string, and because $MMDD matches 1114, it's the first date to be returned. For some really detailed output on the workings of this regex behaviour try adding use re 'debug' to the top of your script to see exactly what the regex engine is doing at every step.

      Probably what you want instead of alternation, which will not do what you want in this particular case, is code that will match a given string with a list of regexes where the order correlates to the precedence of the regex e.g

      my $dom = qr{0[1-9]|[12][0-9]|3[01]}; my $month = qr{0[1-9]|1[012]}; my $fourYear = qr{2003}; my $twoYear = qr{03}; my $MMDDYYYY = qr{$month$dom$fourYear}; my $DDMMYYYY = qr{$dom$month$fourYear}; my $YYYYMMDD = qr{$fourYear$month$dom}; my $DDMMYY = qr{$dom$month$twoYear}; my $MMDDYY = qr{$month$dom$twoYear}; my $MMDD = qr{($month$dom)}; my $DDMM = qr{$dom$month}; my @date_regexes = ( $MMDDYYYY, $DDMMYYYY, $YYYYMMDD, $DDMMYY, $MMDDYY, $MMDD, $DDMM, ); my $line = 'F111406585_D072203_B085_E087_T047-P085_FCC_07222003_539.cdr'; print "date is - ", match_precedence(\@date_regexes, $line), $/; sub match_precedence { my($regs, $str) = @_; for(@$regs) { return $1 if $str =~ /($_)/ } return; } __output__ date is - 07222003
      That's not great code, put hopefully it'll give you something to work with.

      Update - well, you can use a regex with alternation, but it ain't pretty

      my $date_regex = qr{ (?: .*($MMDDYYYY)|.*($DDMMYYYY)|.*($YYYYMMDD)|.*($DDMMYY)| .*($MMDDYY)|.*($MMDD)|.*($DDMM) ) }x;
      Shudder, backtracking hell basically. It'll work but it'll hugely slow on big strings, so the above regex is really for "you can do it" purposes, so I wouldn't advice using it!
      HTH

      _________
      broquaint

        I see, I was looking at it wrong. In the way I stated it, it basically says look at all these patterns (in $Date) and report what matches. So it looks at the first character, compares it to all the regexes in $Date, nothing matches, next looks at the first and second characters, compares the regexes in $Date and finds nothing matches. Does the same for characters 1-3 and once it checks characters 1-4 it finds that one of the regexes in $Date does match and it quits.

        I was looking at it in the wrong way. Thanks for clearing that up.