Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have a question regarding the use of $& . In my code I use the Regexp::List module like so
my $regexp = Regexp::List ->new(modifiers => 'i',quotemeta => 0) ->lis +t2re(@patterns);

The contents of $regexp look like this
(?-xism:(?i:(?=[acdilrsuw])(?:create (?:t(?:able|rigger)|function|defa +ult|pro[cedure]|rule|view)|d(?:rop (?:t(?:able|rigger)|default|functi +on|rule|view)|elete )|s(?:p_(?:bind(?:efault|msg|rule)|drop(?:(?:g|ro +w)lockpromote|key)|p(?:laceobject|rimarykey)|rename(?:_qpgroup)?|set( +?:pg|row)lockpromote|unbind(?:efault|msg|rule)|add_qpgroup|chgattribu +te|foreignkey|hidetext)|etuser)|(?:alter|lock) table|(?:insert|update +) |remove java|writetext)))
What I need to do is to find out which pattern matched and then do something based on the answer. If the pattern matched was "insert update or delete" I need to take further action. The question is what is the best way of handling this ?
I've read some comments about the use of $& being bad to use for performance reasons
e.g.
my $regexp = Regexp::List ->new(modifiers => 'i',quotemeta => 0) ->lis +t2re(@patterns); if ($string =~ /$regexp/) { if ($& eq "insert" or $& eq "delete" or $& eq "update") { do something with $string } call a subroutine here... }
This example code may be called a considerable number of times (i.e. millions). Since I'm starting out in Perl I'd like to make sure I am writing reasonable code. So how do I do this without using $& ? Any help appreciated.

Replies are listed 'Best First'.
Re: Most recent pattern match
by GrandFather (Saint) on Jan 27, 2006 at 10:57 UTC

    Why not just:

    my $regexp = Regexp::List ->new(modifiers => 'i',quotemeta => 0) ->lis +t2re(@patterns); if ($string =~ /$regexp/) { if (string =~ /insert|delete|update/i) { do something with $string }

    DWIM is Perl's answer to Gödel
      ok thanks,a simple answer but I'd like to understand the implications of using or not using the $&. From my untrained eye it seems that answer you kindly posted will be scanning the string twice. Once for the general pattern search and then again for the specific pattern, whereas using $& would not involve the second string scan ?

        I thought, without trying it, that the hit from the second match would be compensated for by not using $&, but benchmark tests I tried indicate that that is not the case. I didn't see any additional hit as a result of using $& at all!

        I guess the moral is: try the simple stuff, but validate assumptions.


        DWIM is Perl's answer to Gödel
Re: Most recent pattern match
by BrowserUk (Patriarch) on Jan 27, 2006 at 12:55 UTC

    Not an answer to your original question, but that generated regex looks wrong to me? One of the first level alternations deobfuscates to

    (?:create (?: t(?:able|rigger) | function | default | pro[cedure] | rule | view )

    Which all looks fairly sane and familiar except for pro[cedure] which is looking to accept

    create (?:proc|proe|prod|prou|pror|proe)

    Which I guess could be true, but it seems more likely that it should be

    create proc(?:edure)?

    Maybe a patch for Regex::List is called for?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      The patterns are read from a file and pushed into the array upon which the regex list is built. The file contains this
      create pro[cedure]
      Perhaps it's an incorrect definition on my part . I wanted it to match in the way that you have specified i.e. like this

      create proc|proce|proced|procedu|procedur|procedure

        Okay, at one level it is a user/data error. However, it highlights another potential bug in Regex::List, namely that it should escape regex special characters where they appear in literals. Unless it is meant to allow you to pass in regexes for mangling as well?

        With respect to what you are trying to achieve, you'll have to read the docs for the module, but if it doesn't have it's own syntax (I haven't looked), for specifying "this literal or an abbreviation of with a minimum of N chars", you would probably have to repeat all of the possibilities in the input?

        If it does allow you supply part regexes in the input data, then you'd have to reverse the ordering of your alternation above as they are matched left to right, and if you put the shortest first, that will always be match leaving the rest of any longer match redundant and so prevent matching of the entire string. You'd also need to group them. Ie.

        create (?:procedure|procedur|procedu|proced|proce|proc)

        That said, if this is SQL, then I think most variants will either accept create proc or create procedure, but not the other intermediate abbreviations, in which case you would just want

        create proc(?:edure)?

        As I indicated--assuming it will let you do that. If not, supplying a case for both alternatives fully spelt out should allow it to derive the above for itself.

        I'd be very wary of it not escaping regex special chars though.

        I've also had a go at trying to make the regex you posted match a simple string of 'insert' or 'update', and it doesn't, though I haven't worked out why. It looks cursorily like it ought to, but I haven't expended a huge amount of effort trying to unwind it's convolutions. Maybe you should consider one of the other alternatives--or at least compare the results they produce.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Most recent pattern match
by grinder (Bishop) on Jan 27, 2006 at 11:43 UTC

    Try something like

    my $regexp = Regexp::Assemble->new( flags => 'i', track => 1 )->add(@patterns); if( $regexp->match($string) ) { print "$string was matched by pattern ", $regexp->matched, "\n"; }

    This doesn't use $&, and also performs only a single match, no looping required. You could use the matched pattern as a key into a hash, and set up a dispatch table.

    • another intruder with the mooring in the heart of the Perl

Re: Most recent pattern match
by ikegami (Patriarch) on Jan 27, 2006 at 18:57 UTC
    Using $& will slow down every regexp in your programs that doesn't capture. The minimal fix is:
    if ($string =~ /($regexp)/) { if ($1 eq "insert" or $1 eq "delete" or $1 eq "update") { do something with $string } call a subroutine here... }