in reply to Re: Bulk Regex?
in thread Bulk Regex?

Thanks! I tried this, and it appears to speed execution by 5-7% with the current environment. Basically, this application accepts a list of department code prefixes and searches through a list of employees to extract those that match. I'm actually split()ing each line of the file and matching only one field, using the other fields in business logic if any of the department code prefixes match.

Replies are listed 'Best First'.
Re: Re: Re: Bulk Regex?
by RMGir (Prior) on Aug 29, 2002 at 16:15 UTC
    Aha! Then don't use a regex.
    # substitute in your dept codes, and a better hash name :) my %deptCodesICareAbout=( RSRCH => 1, SALES => 1 ); # substitute in the right FILE, field list, and delimiter... while(<FILE>) { my ($code, #whateverelse )=split/,/; if($deptCodesICareAbout{$code}) { # business logic } }
    That will be MUCH faster.
    --
    Mike
      I wish it were that simple! What I have are dept code *prefixes* which are not necessarily all the same length. For example, if our department codes were like:

      ABCD123456 DEFG123456 DEFG998800 DEEE11223344

      The code prefixes I have are like:

      ABCD12 DEF DEEE11223

      In other words, they can be any length. I tried using something like this:

      foreach my $pattern (@patterns) { if (substr($fields[7],0,length($pattern)) eq $pattern) { ### Code here last; } }
      but that was slightly slower than the regex methods.
        Ouch.

        How many prefixes of each length, though? If you have many of a given length, maybe you could do something like this (writing support code to generate this HOH, of course):

        my %prefixList=( 3 => {DEF=>1, SRC=>1}, 6 => {ABCD12=>1}, 10 => {DEEEE11223}, ); #.... # inside the loop foreach my $length(%prefixList) { if($prefixList{$length}->{substr($fields[7],0,$length)}){ ### code here last; } }
        If the set of sizes isn't dynamic, you could make that an AOH, with some possibly empty hashes, I guess... You could also get rid of the loop and inline the check for each length, but I don't think that would save you much.

        If you're only ever going to have 1 or 2 prefixes of a given length, then this isn't worth it. But given your initial description, I think with a LOT of prefixes and very few different lengths, this could start to pull ahead of the regex.
        --
        Mike