artist has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks

I would like to match whatever matches first among my patterns:

#!/usr/local/bin/perl -w use strict; my $str1 = 'ABCBXBCA'; my $str2 = 'APCBXBCAC'; my @patterns = ('B.B', 'CB'); foreach my $string ($str1,$str2){ foreach my $pat (@patterns){ if($string =~ /$pat/){ print "String:$string Pattern:$pat KeyWord:$` +\n";; last; } } }
The Output is:
String:ABCBXBCA Pattern:B.B KeyWord:A
String:APCBXBCAC Pattern:B.B KeyWord:APC
I would like to have
String:ABCBXBCA Pattern:B.B KeyWord:A
String:APCBXBCAC Pattern:B.B KeyWord:AP
Just note the only change of AP from APC in the current results.. as the pattern 'CB' can match first before 'B.B'.

How I can accomplish this..
Thanks,
Artist

Replies are listed 'Best First'.
Re: First Pattern Matching
by BrowserUk (Patriarch) on Jul 10, 2002 at 22:42 UTC

    One way to acheive your end, if you don't mind not knowing which pattern matched, but only need the keyword, would be to combine your patterns using the | (regex or operator) and remove the inner loop as here.

    #!/usr/local/bin/perl -w use strict; my $str1 = 'ABCBXBCA'; my $str2 = 'APCBXBCAC'; my $patterns = 'B.B|CB'; foreach my $string ($str1,$str2){ # foreach my $pat (@patterns){ if($string =~ /$patterns/){ print "String:$string Pattern:$patterns KeyWor +d:$` \n";; # last; } # } } __DATA__ #ouptut C:\test>180890 String:ABCBXBCA Pattern:B.B|CB KeyWord:A String:APCBXBCAC Pattern:B.B|CB KeyWord:AP C:\test>

    Update:Following jryan's post below, I thought I remembered a special var that would return the last match. Looking it up I found $+.

    #!/usr/local/bin/perl -w use strict; my $str1 = 'ABCBXBCA'; my $str2 = 'APCBXBCAC'; my $patterns = 'B.B|CB'; foreach my $string ($str1,$str2){ # foreach my $pat (@patterns){ if($string =~ /($patterns)/) { print "String:$string Pattern:$+ KeyWord:$` \n +";; # last; } # } } __DATA__ #output C:\test>180890 String:ABCBXBCA Pattern:BCB KeyWord:A String:APCBXBCAC Pattern:CB KeyWord:AP C:\test>

    Note the brackets in the if statement and the $+ in the print.

    UpdateAs jryan notes below, this does still not completely match the requirements of the original question. I, nor anyone else it seems, had noticed that $+ returns the text matched not the pattern that matched it.

    If you need to know which pattern matched, you will have to either use jryan solution from here or possibly look into using the scarey (to me at least) technique that hofmator doesn't suggest in the last if statement in this post. If you opt for the latter, you'd better takes a good look at some of japhys or abigail-II's posts as this level of regex is well beyond me at the moment.

    Finally. Check perlman:perlre for dire warning regarding the performance hit of using $` which you already had, and $+ which I added.


    Anyone know of an abbottoire going cheap?

      I just noticed that your answer is still incomplete. It only outputs the last match, and not the last pattern matched. For instance: BCB is not the same as B.B. You'll still have to do some sort of pattern lookup (such as I did below, or use something like Tie::Hash::Regex or Tie::Hash::Approx to construct a table and then match against that.

Re: Re: First Pattern Matching
by jryan (Vicar) on Jul 10, 2002 at 22:54 UTC
    #!perl -w use strict; my @patterns = ('B.B', 'CB'); # join the patterns to do one match per string my $pat = join '|',@patterns; foreach my $string ('ABCBXBCA','APCBXBCAC') { if($string =~ /($pat)/o) { # then do a pattern lookup to see which pattern matched. my $matched; foreach my $p (@patterns) { if ($1 =~ /$p/) { $matched = $p; last; } } print "String:$string Pattern:$matched KeyWord:$` \n"; } }

    I enjoyed the creative use of $`, btw.

    Reparented to root per author's req. - dvergin 2002-07-10

      As you can read here I like to qr// my patterns. In my case I gained nothing from it (other than not biting myself), but you'd gain a great deal by qr//:ing the @patterns. And while we're talking about qr//, I'd prefer to see a qr// object instead of an o modifier in the if expression.

      The o modifier is evil and as of qr//'s introduction we no longer need it. But we'd be fine even without both qr// and the o modifier in this particular case. That is because perl is friendly enough to remember the last pattern used for every match op. If the patterns are identical (stringwise) the last compiled regex for that match op is used. Since you only use $pat once no recompilation is done.

      Demonstration:
      my @patterns1 = ('foo') x 2; my @patterns2 = ('bar') x 2; use re 'debug'; while (@patterns1) { 'a' =~ shift @patterns1; 'b' =~ shift @patterns2; } __END__ Compiling REx `foo' size 3 first at 1 1: EXACT <foo>(3) 3: END(0) anchored `foo' at 0 (checking anchored isall) minlen 3 Compiling REx `bar' size 3 first at 1 1: EXACT <bar>(3) 3: END(0) anchored `bar' at 0 (checking anchored isall) minlen 3 Freeing REx: `foo' Freeing REx: `bar'
      As you can see, the foo pattern and the bar pattern are only compiled once. But there's no harm in using qr// here, so I still suggest it.

      Cheers,
      -Anomo

        I'm glad you brought up this point. I too am a huge fan of qr; however, I think this situation is a perfect use of the /o operator. I assumed that the snippet the author posted was but a morsal of his actual code; he probably uses dozens of patterns and thousands of lines of input. Compiling the regex with /o (rather than building it with qr) is ideal for this situation where a single regex is to be applied to huge amounts of data. It will result in a speed boost. For instance, I modified my earlier code and ran this benchmark:

        use Benchmark; timethese(1000, { Slasho => \&withslasho, None => \&without, qr => \&withqr }); sub withslasho { my $str1 = 'ABCBXBCA'; my $str2 = 'APCBXBCAC'; my @array = ($str1, $str2) x 500; my @patterns = ('B.B', 'CB')x10; my $pat = join '|',@patterns; foreach my $string (@array) { if($string =~ /($pat)/o) { # do a pattern lookup to see which pattern matched. my $matched; foreach my $p (@patterns) { if ($1 =~ /$p/) { $matched = $p; last; } } } } } sub without { my $str1 = 'ABCBXBCA'; my $str2 = 'APCBXBCAC'; my @array = ($str1, $str2) x 500; my @patterns = ('B.B', 'CB')x10; my $pat = join '|',@patterns; foreach my $string (@array) { if($string =~ /($pat)/) { # do a pattern lookup to see which pattern matched. my $matched; foreach my $p (@patterns) { if ($1 =~ /$p/) { $matched = $p; last; } } } } } sub withqr { my $str1 = 'ABCBXBCA'; my $str2 = 'APCBXBCAC'; my @array = ($str1, $str2) x 500; my @patterns = ('B.B', 'CB')x10; my $pat = join '|',@patterns; $pat = qr/$pat/; foreach my $string (@array) { if($string =~ /($pat)/) { # do a pattern lookup to see which pattern matched. my $matched; foreach my $p (@patterns) { if ($1 =~ /$p/) { $matched = $p; last; } } } } }

        Which outputs:

        Benchmark: timing 1000 iterations of None, Slasho, qr... None: 70 wallclock secs (69.60 usr + 0.00 sys = 69.60 CPU) @ 14 +.37/s (n=1000) Slasho: 61 wallclock secs (61.24 usr + 0.00 sys = 61.24 CPU) @ 16 +.33/s (n=1000) qr: 66 wallclock secs (65.80 usr + 0.00 sys = 65.80 CPU) @ 15 +.20/s (n=1000)
Re: First Pattern Matching
by Anonymous Monk on Jul 11, 2002 at 21:31 UTC
    This is my solution. Its advantage is that it doesn't check which pattern that matched afterwards. Instead it registers which pattern that matched during the match.

    my @strs = qw/ ABCBXBCA APCBXBCAC /; my @patterns = (qr/B.B/, qr/CB/); # See comment below. my $matched_pattern; my $all_pats = do { use re 'eval'; qr/(.*?)(?:@{[join '|', map "$patterns[$_](?{\$matched_pattern = $ +_})", 0 .. $#patterns]})/; }; foreach (@strs) { print "String:$_ Pattern:$patterns[$matched_pattern] KeyWord:$1\n" if /$all_pats/; }
    I use qr// around all the patterns in the assignment to @patterns in this code. That is to avoid slip-ups involving \s and such (for double-quoted strings). However, doing that is a slight compile-time performance hit in this case since I never use the patterns directly, just indirectly. But I like to qr// them anyway, especially if it is an important piece of code. (And besides, you get a FREE and FUN non-capturing parenthesis around it! :))

    Cheers,
    -Anomo
Re: First Pattern Matching
by Aristotle (Chancellor) on Jul 11, 2002 at 22:42 UTC
    The Anonymonk is close, but I prefer a solution that works without code assertions:
    #!/usr/local/bin/perl -w use strict; my @pattern = ('B.B', 'CB'); my $re = qr/^(.*?)(?:@{[ join "|", map "($_)", @pattern ]})/; foreach (qw(ABCBXBCA APCBXBCAC)){ if(my ($keyword,@match) = /$re/) { my $i = 0; $i++ until defined shift @match; print "String:$_ Pattern:$pattern[$i] KeyWord:$keyword +\n"; } }

    The key is that I capture separately for each pattern.

    Update: per jryan's point, edited to generate the regex once before the loop.

    Makeshifts last the longest.

      Hrrmm, I must admit I like this solution better than my solution. It's cleaner and simpler. Darn that "Keep it simple". ;)

      But it still suffers from having to look up the match afterwards, and that I don't like. Indeed it's a good way, but for a large @pattern and some "bad luck" you still have notably unnecessary overhead when having large lists to check against. For simple things that other people might read I'd use your method, but for a module that needs efficiency (don't all?) I'd use my way, I think.

      Cheers,
      -Anomo