in reply to Re: qr() match order with multiple patterns
in thread qr() match order with multiple patterns

Whoops! Kind of a typo, I was trying something different and forgot about that. I just went and changed it but I am still having the problem.

Here are two example lines I am examining:

F111406585_D072203_B085_E087_T047-P085_FCC_07222003_539.cdr USL_111406585_P085_A87_030723
The first line, the date is 07222003, but it reports the 1114 that is found just after the 'F' in the beginning of the line. If in my code I take out the last $MMDD I then get '1406' returned for both lines. If I now take out $DDMM as well I then get 072203 for line 1 and a statement that there is no match for line 2. Now I take out $MMDDYY and I finally get 07222003 for line one and a statemant that line 2 does not match.

It appears as if the match is happening from right to left, but that just didn't make sense to me.

Replies are listed 'Best First'.
Re: Re: Re: qr() match order with multiple patterns
by broquaint (Abbot) on Jul 23, 2003 at 14:08 UTC
    The first line, the date is 07222003, but it reports the 1114 that is found just after the 'F' in the beginning of the line
    Correct, as that matches $MMDD. This is because the alternation tries to match at every point of the string, and because $MMDD matches 1114, it's the first date to be returned. For some really detailed output on the workings of this regex behaviour try adding use re 'debug' to the top of your script to see exactly what the regex engine is doing at every step.

    Probably what you want instead of alternation, which will not do what you want in this particular case, is code that will match a given string with a list of regexes where the order correlates to the precedence of the regex e.g

    my $dom = qr{0[1-9]|[12][0-9]|3[01]}; my $month = qr{0[1-9]|1[012]}; my $fourYear = qr{2003}; my $twoYear = qr{03}; my $MMDDYYYY = qr{$month$dom$fourYear}; my $DDMMYYYY = qr{$dom$month$fourYear}; my $YYYYMMDD = qr{$fourYear$month$dom}; my $DDMMYY = qr{$dom$month$twoYear}; my $MMDDYY = qr{$month$dom$twoYear}; my $MMDD = qr{($month$dom)}; my $DDMM = qr{$dom$month}; my @date_regexes = ( $MMDDYYYY, $DDMMYYYY, $YYYYMMDD, $DDMMYY, $MMDDYY, $MMDD, $DDMM, ); my $line = 'F111406585_D072203_B085_E087_T047-P085_FCC_07222003_539.cdr'; print "date is - ", match_precedence(\@date_regexes, $line), $/; sub match_precedence { my($regs, $str) = @_; for(@$regs) { return $1 if $str =~ /($_)/ } return; } __output__ date is - 07222003
    That's not great code, put hopefully it'll give you something to work with.

    Update - well, you can use a regex with alternation, but it ain't pretty

    my $date_regex = qr{ (?: .*($MMDDYYYY)|.*($DDMMYYYY)|.*($YYYYMMDD)|.*($DDMMYY)| .*($MMDDYY)|.*($MMDD)|.*($DDMM) ) }x;
    Shudder, backtracking hell basically. It'll work but it'll hugely slow on big strings, so the above regex is really for "you can do it" purposes, so I wouldn't advice using it!
    HTH

    _________
    broquaint

      I see, I was looking at it wrong. In the way I stated it, it basically says look at all these patterns (in $Date) and report what matches. So it looks at the first character, compares it to all the regexes in $Date, nothing matches, next looks at the first and second characters, compares the regexes in $Date and finds nothing matches. Does the same for characters 1-3 and once it checks characters 1-4 it finds that one of the regexes in $Date does match and it quits.

      I was looking at it in the wrong way. Thanks for clearing that up.