in reply to Re: Re: qr() match order with multiple patterns
in thread qr() match order with multiple patterns

The first line, the date is 07222003, but it reports the 1114 that is found just after the 'F' in the beginning of the line
Correct, as that matches $MMDD. This is because the alternation tries to match at every point of the string, and because $MMDD matches 1114, it's the first date to be returned. For some really detailed output on the workings of this regex behaviour try adding use re 'debug' to the top of your script to see exactly what the regex engine is doing at every step.

Probably what you want instead of alternation, which will not do what you want in this particular case, is code that will match a given string with a list of regexes where the order correlates to the precedence of the regex e.g

my $dom = qr{0[1-9]|[12][0-9]|3[01]}; my $month = qr{0[1-9]|1[012]}; my $fourYear = qr{2003}; my $twoYear = qr{03}; my $MMDDYYYY = qr{$month$dom$fourYear}; my $DDMMYYYY = qr{$dom$month$fourYear}; my $YYYYMMDD = qr{$fourYear$month$dom}; my $DDMMYY = qr{$dom$month$twoYear}; my $MMDDYY = qr{$month$dom$twoYear}; my $MMDD = qr{($month$dom)}; my $DDMM = qr{$dom$month}; my @date_regexes = ( $MMDDYYYY, $DDMMYYYY, $YYYYMMDD, $DDMMYY, $MMDDYY, $MMDD, $DDMM, ); my $line = 'F111406585_D072203_B085_E087_T047-P085_FCC_07222003_539.cdr'; print "date is - ", match_precedence(\@date_regexes, $line), $/; sub match_precedence { my($regs, $str) = @_; for(@$regs) { return $1 if $str =~ /($_)/ } return; } __output__ date is - 07222003
That's not great code, put hopefully it'll give you something to work with.

Update - well, you can use a regex with alternation, but it ain't pretty

my $date_regex = qr{ (?: .*($MMDDYYYY)|.*($DDMMYYYY)|.*($YYYYMMDD)|.*($DDMMYY)| .*($MMDDYY)|.*($MMDD)|.*($DDMM) ) }x;
Shudder, backtracking hell basically. It'll work but it'll hugely slow on big strings, so the above regex is really for "you can do it" purposes, so I wouldn't advice using it!
HTH

_________
broquaint

Replies are listed 'Best First'.
Re: Re: Re: Re: qr() match order with multiple patterns
by gnu@perl (Pilgrim) on Jul 23, 2003 at 14:17 UTC
    I see, I was looking at it wrong. In the way I stated it, it basically says look at all these patterns (in $Date) and report what matches. So it looks at the first character, compares it to all the regexes in $Date, nothing matches, next looks at the first and second characters, compares the regexes in $Date and finds nothing matches. Does the same for characters 1-3 and once it checks characters 1-4 it finds that one of the regexes in $Date does match and it quits.

    I was looking at it in the wrong way. Thanks for clearing that up.