Most recent pattern match

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have a question regarding the use of $& . In my code I use the Regexp::List module like so

my $regexp = Regexp::List ->new(modifiers => 'i',quotemeta => 0) ->lis
+t2re(@patterns);
[download]

The contents of $regexp look like this

(?-xism:(?i:(?=[acdilrsuw])(?:create (?:t(?:able|rigger)|function|defa
+ult|pro[cedure]|rule|view)|d(?:rop (?:t(?:able|rigger)|default|functi
+on|rule|view)|elete )|s(?:p_(?:bind(?:efault|msg|rule)|drop(?:(?:g|ro
+w)lockpromote|key)|p(?:laceobject|rimarykey)|rename(?:_qpgroup)?|set(
+?:pg|row)lockpromote|unbind(?:efault|msg|rule)|add_qpgroup|chgattribu
+te|foreignkey|hidetext)|etuser)|(?:alter|lock) table|(?:insert|update
+) |remove java|writetext)))
[download]

What I need to do is to find out which pattern matched and then do something based on the answer. If the pattern matched was "insert update or delete" I need to take further action. The question is what is the best way of handling this ?
I've read some comments about the use of $& being bad to use for performance reasons
e.g.

my $regexp = Regexp::List ->new(modifiers => 'i',quotemeta => 0) ->lis
+t2re(@patterns);
if ($string =~ /$regexp/) {
  if ($& eq "insert" or $& eq "delete" or $& eq "update") {
          do something with $string
  } 
call a subroutine here...
}
[download]

This example code may be called a considerable number of times (i.e. millions). Since I'm starting out in Perl I'd like to make sure I am writing reasonable code. So how do I do this without using $& ? Any help appreciated.

Comment on Most recent pattern match Select or Download Code

Replies are listed 'Best First'.
Re: Most recent pattern match by GrandFather (Saint) on Jan 27, 2006 at 10:57 UTC
Why not just: `my $regexp = Regexp::List ->new(modifiers => 'i',quotemeta => 0) ->lis +t2re(@patterns); if ($string =~ /$regexp/) { if (string =~ /insert\|delete\|update/i) { do something with $string }` [download] DWIM is Perl's answer to Gödel	[reply] [d/l]
Re^2: Most recent pattern match by Anonymous Monk on Jan 27, 2006 at 11:07 UTC
ok thanks,a simple answer but I'd like to understand the implications of using or not using the $&. From my untrained eye it seems that answer you kindly posted will be scanning the string twice. Once for the general pattern search and then again for the specific pattern, whereas using $& would not involve the second string scan ?	[reply]
Re^3: Most recent pattern match by GrandFather (Saint) on Jan 27, 2006 at 19:15 UTC
I thought, without trying it, that the hit from the second match would be compensated for by not using $&, but benchmark tests I tried indicate that that is not the case. I didn't see any additional hit as a result of using $& at all! I guess the moral is: try the simple stuff, but validate assumptions. DWIM is Perl's answer to Gödel	[reply]
Re: Most recent pattern match by BrowserUk (Patriarch) on Jan 27, 2006 at 12:55 UTC
Not an answer to your original question, but that generated regex looks wrong to me? One of the first level alternations deobfuscates to `(?:create (?: t(?:able\|rigger) \| function \| default \| pro[cedure] \| rule \| view )` [download] Which all looks fairly sane and familiar except for `pro[cedure]` which is looking to accept `create (?:proc\|proe\|prod\|prou\|pror\|proe)` [download] Which I guess could be true, but it seems more likely that it should be `create proc(?:edure)?` [download] Maybe a patch for Regex::List is called for? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re^2: Most recent pattern match by Anonymous Monk on Jan 27, 2006 at 13:56 UTC
The patterns are read from a file and pushed into the array upon which the regex list is built. The file contains this `create pro[cedure]` Perhaps it's an incorrect definition on my part . I wanted it to match in the way that you have specified i.e. like this `create proc\|proce\|proced\|procedu\|procedur\|procedure` [download]	[reply] [d/l] [select]
Re^3: Most recent pattern match by BrowserUk (Patriarch) on Jan 27, 2006 at 14:40 UTC
Okay, at one level it is a user/data error. However, it highlights another potential bug in Regex::List, namely that it should escape regex special characters where they appear in literals. Unless it is meant to allow you to pass in regexes for mangling as well? With respect to what you are trying to achieve, you'll have to read the docs for the module, but if it doesn't have it's own syntax (I haven't looked), for specifying "this literal or an abbreviation of with a minimum of N chars", you would probably have to repeat all of the possibilities in the input? If it does allow you supply part regexes in the input data, then you'd have to reverse the ordering of your alternation above as they are matched left to right, and if you put the shortest first, that will always be match leaving the rest of any longer match redundant and so prevent matching of the entire string. You'd also need to group them. Ie. `create (?:procedure\|procedur\|procedu\|proced\|proce\|proc)` [download] That said, if this is SQL, then I think most variants will either accept `create proc` or `create procedure`, but not the other intermediate abbreviations, in which case you would just want `create proc(?:edure)?` [download] As I indicated--assuming it will let you do that. If not, supplying a case for both alternatives fully spelt out should allow it to derive the above for itself. I'd be very wary of it not escaping regex special chars though. I've also had a go at trying to make the regex you posted match a simple string of 'insert' or 'update', and it doesn't, though I haven't worked out why. It looks cursorily like it ought to, but I haven't expended a huge amount of effort trying to unwind it's convolutions. Maybe you should consider one of the other alternatives--or at least compare the results they produce. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re: Most recent pattern match by grinder (Bishop) on Jan 27, 2006 at 11:43 UTC
Try something like `my $regexp = Regexp::Assemble->new( flags => 'i', track => 1 )->add(@patterns); if( $regexp->match($string) ) { print "$string was matched by pattern ", $regexp->matched, "\n"; }` [download] This doesn't use `$&`, and also performs only a single match, no looping required. You could use the matched pattern as a key into a hash, and set up a dispatch table. • another intruder with the mooring in the heart of the Perl	[reply] [d/l]
Re: Most recent pattern match by ikegami (Patriarch) on Jan 27, 2006 at 18:57 UTC
Using $& will slow down every regexp in your programs that doesn't capture. The minimal fix is: `if ($string =~ /($regexp)/) { if ($1 eq "insert" or $1 eq "delete" or $1 eq "update") { do something with $string } call a subroutine here... }` [download]	[reply] [d/l]