First Pattern Matching

artist has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: First Pattern Matching by BrowserUk (Patriarch) on Jul 10, 2002 at 22:42 UTC
One way to acheive your end, ~~if you don't mind not knowing which pattern matched, but only need the keyword~~, would be to combine your patterns using the \| (regex or operator) and remove the inner loop as here. #!/usr/local/bin/perl -w use strict; my $str1 = 'ABCBXBCA'; my $str2 = 'APCBXBCAC'; my $patterns = 'B.B\|CB'; foreach my $string ($str1,$str2){ # foreach my $pat (@patterns){ if($string =~ /$patterns/){ print "String:$string Pattern:$patterns KeyWor +d:$` \n";; # last; } # } } __DATA__ #ouptut C:\test>180890 String:ABCBXBCA Pattern:B.B\|CB KeyWord:A String:APCBXBCAC Pattern:B.B\|CB KeyWord:AP C:\test> [download] Update:Following jryan's post below, I thought I remembered a special var that would return the last match. Looking it up I found $+. #!/usr/local/bin/perl -w use strict; my $str1 = 'ABCBXBCA'; my $str2 = 'APCBXBCAC'; my $patterns = 'B.B\|CB'; foreach my $string ($str1,$str2){ # foreach my $pat (@patterns){ if($string =~ /($patterns)/) { print "String:$string Pattern:$+ KeyWord:$` \n +";; # last; } # } } __DATA__ #output C:\test>180890 String:ABCBXBCA Pattern:BCB KeyWord:A String:APCBXBCAC Pattern:CB KeyWord:AP C:\test> [download] Note the brackets in the if statement and the $+ in the print. UpdateAs jryan notes below, this does still not completely match the requirements of the original question. I, nor anyone else it seems, had noticed that $+ returns the text matched not the pattern that matched it. If you need to know which pattern matched, you will have to either use jryan solution from here or possibly look into using the scarey (to me at least) technique that hofmator doesn't suggest in the last `if` statement in this post. If you opt for the latter, you'd better takes a good look at some of japhys or abigail-II's posts as this level of regex is well beyond me at the moment. Finally. Check perlman:perlre for dire warning regarding the performance hit of using $` which you already had, and $+ which I added. Anyone know of an abbottoire going cheap?	[reply] [d/l] [select]
Re: Re: First Pattern Matching by jryan (Vicar) on Jul 11, 2002 at 22:00 UTC
I just noticed that your answer is still incomplete. It only outputs the last match, and not the last pattern matched. For instance: `BCB` is not the same as `B.B`. You'll still have to do some sort of pattern lookup (such as I did below, or use something like Tie::Hash::Regex or Tie::Hash::Approx to construct a table and then match against that.	[reply] [d/l] [select]
Re: Re: First Pattern Matching by jryan (Vicar) on Jul 10, 2002 at 22:54 UTC
#!perl -w use strict; my @patterns = ('B.B', 'CB'); # join the patterns to do one match per string my $pat = join '\|',@patterns; foreach my $string ('ABCBXBCA','APCBXBCAC') { if($string =~ /($pat)/o) { # then do a pattern lookup to see which pattern matched. my $matched; foreach my $p (@patterns) { if ($1 =~ /$p/) { $matched = $p; last; } } print "String:$string Pattern:$matched KeyWord:$` \n"; } } [download] I enjoyed the creative use of $`, btw. Reparented to root per author's req. - dvergin 2002-07-10	[reply] [d/l]
Re: Re: Re: First Pattern Matching by Anonymous Monk on Jul 11, 2002 at 23:23 UTC
As you can read here I like to `qr//` my patterns. In my case I gained nothing from it (other than not biting myself), but you'd gain a great deal by `qr//`:ing the `@patterns`. And while we're talking about `qr//`, I'd prefer to see a `qr//` object instead of an o modifier in the `if` expression. The o modifier is evil and as of `qr//`'s introduction we no longer need it. But we'd be fine even without both `qr//` and the o modifier in this particular case. That is because perl is friendly enough to remember the last pattern used for every match op. If the patterns are identical (stringwise) the last compiled regex for that match op is used. Since you only use `$pat` once no recompilation is done. Demonstration: my @patterns1 = ('foo') x 2; my @patterns2 = ('bar') x 2; use re 'debug'; while (@patterns1) { 'a' =~ shift @patterns1; 'b' =~ shift @patterns2; } __END__ Compiling REx `foo' size 3 first at 1 1: EXACT <foo>(3) 3: END(0) anchored `foo' at 0 (checking anchored isall) minlen 3 Compiling REx `bar' size 3 first at 1 1: EXACT <bar>(3) 3: END(0) anchored `bar' at 0 (checking anchored isall) minlen 3 Freeing REx: `foo' Freeing REx: `bar' [download] As you can see, the foo pattern and the bar pattern are only compiled once. But there's no harm in using `qr//` here, so I still suggest it. Cheers, -Anomo	[reply] [d/l] [select]
Re: Re: Re: Re: First Pattern Matching by jryan (Vicar) on Jul 12, 2002 at 01:19 UTC
I'm glad you brought up this point. I too am a huge fan of qr; however, I think this situation is a perfect use of the /o operator. I assumed that the snippet the author posted was but a morsal of his actual code; he probably uses dozens of patterns and thousands of lines of input. Compiling the regex with /o (rather than building it with qr) is ideal for this situation where a single regex is to be applied to huge amounts of data. It will result in a speed boost. For instance, I modified my earlier code and ran this benchmark: use Benchmark; timethese(1000, { Slasho => \&withslasho, None => \&without, qr => \&withqr }); sub withslasho { my $str1 = 'ABCBXBCA'; my $str2 = 'APCBXBCAC'; my @array = ($str1, $str2) x 500; my @patterns = ('B.B', 'CB')x10; my $pat = join '\|',@patterns; foreach my $string (@array) { if($string =~ /($pat)/o) { # do a pattern lookup to see which pattern matched. my $matched; foreach my $p (@patterns) { if ($1 =~ /$p/) { $matched = $p; last; } } } } } sub without { my $str1 = 'ABCBXBCA'; my $str2 = 'APCBXBCAC'; my @array = ($str1, $str2) x 500; my @patterns = ('B.B', 'CB')x10; my $pat = join '\|',@patterns; foreach my $string (@array) { if($string =~ /($pat)/) { # do a pattern lookup to see which pattern matched. my $matched; foreach my $p (@patterns) { if ($1 =~ /$p/) { $matched = $p; last; } } } } } sub withqr { my $str1 = 'ABCBXBCA'; my $str2 = 'APCBXBCAC'; my @array = ($str1, $str2) x 500; my @patterns = ('B.B', 'CB')x10; my $pat = join '\|',@patterns; $pat = qr/$pat/; foreach my $string (@array) { if($string =~ /($pat)/) { # do a pattern lookup to see which pattern matched. my $matched; foreach my $p (@patterns) { if ($1 =~ /$p/) { $matched = $p; last; } } } } } [download] Which outputs: `Benchmark: timing 1000 iterations of None, Slasho, qr... None: 70 wallclock secs (69.60 usr + 0.00 sys = 69.60 CPU) @ 14 +.37/s (n=1000) Slasho: 61 wallclock secs (61.24 usr + 0.00 sys = 61.24 CPU) @ 16 +.33/s (n=1000) qr: 66 wallclock secs (65.80 usr + 0.00 sys = 65.80 CPU) @ 15 +.20/s (n=1000)` [download]	[reply] [d/l] [select]
Re: Re: Re: Re: Re: First Pattern Matching by Anonymous Monk on Jul 12, 2002 at 02:28 UTC
Re: First Pattern Matching by Anonymous Monk on Jul 11, 2002 at 21:31 UTC
This is my solution. Its advantage is that it doesn't check which pattern that matched afterwards. Instead it registers which pattern that matched during the match. `my @strs = qw/ ABCBXBCA APCBXBCAC /; my @patterns = (qr/B.B/, qr/CB/); # See comment below. my $matched_pattern; my $all_pats = do { use re 'eval'; qr/(.*?)(?:@{[join '\|', map "$patterns[$_](?{\$matched_pattern = $ +_})", 0 .. $#patterns]})/; }; foreach (@strs) { print "String:$_ Pattern:$patterns[$matched_pattern] KeyWord:$1\n" if /$all_pats/; }` [download] I use `qr//` around all the patterns in the assignment to `@patterns` in this code. That is to avoid slip-ups involving `\s` and such (for double-quoted strings). However, doing that is a slight compile-time performance hit in this case since I never use the patterns directly, just indirectly. But I like to `qr//` them anyway, especially if it is an important piece of code. (And besides, you get a FREE and FUN non-capturing parenthesis around it! `:)`) Cheers, -Anomo	[reply] [d/l] [select]
Re: First Pattern Matching by Aristotle (Chancellor) on Jul 11, 2002 at 22:42 UTC
The Anonymonk is close, but I prefer a solution that works without code assertions: `#!/usr/local/bin/perl -w use strict; my @pattern = ('B.B', 'CB'); my $re = qr/^(.?)(?:@{[ join "\|", map "($_)", @pattern ]})/; foreach (qw(ABCBXBCA APCBXBCAC)){ if(my ($keyword,@match) = /$re/) { my $i = 0; $i++ until defined shift @match; print "String:$_ Pattern:$pattern[$i] KeyWord:$keyword +\n"; } }` [download] The key is that I capture separately for each pattern. Update:* per jryan's point, edited to generate the regex once before the loop. Makeshifts last the longest.	[reply] [d/l]
Re: Re: First Pattern Matching by Anonymous Monk on Jul 11, 2002 at 23:56 UTC
Hrrmm, I must admit I like this solution better than my solution. It's cleaner and simpler. Darn that "Keep it simple". `;)` But it still suffers from having to look up the match afterwards, and that I don't like. Indeed it's a good way, but for a large `@pattern` and some "bad luck" you still have notably unnecessary overhead when having large lists to check against. For simple things that other people might read I'd use your method, but for a module that needs efficiency (don't all?) I'd use my way, I think. Cheers, -Anomo	[reply] [d/l] [select]