in reply to Re: Re: Bulk Regex?
in thread Bulk Regex?

Aha! Then don't use a regex.
# substitute in your dept codes, and a better hash name :) my %deptCodesICareAbout=( RSRCH => 1, SALES => 1 ); # substitute in the right FILE, field list, and delimiter... while(<FILE>) { my ($code, #whateverelse )=split/,/; if($deptCodesICareAbout{$code}) { # business logic } }
That will be MUCH faster.
--
Mike

Replies are listed 'Best First'.
Re: Re: Re: Re: Bulk Regex?
by meetraz (Hermit) on Aug 29, 2002 at 16:44 UTC
    I wish it were that simple! What I have are dept code *prefixes* which are not necessarily all the same length. For example, if our department codes were like:

    ABCD123456 DEFG123456 DEFG998800 DEEE11223344

    The code prefixes I have are like:

    ABCD12 DEF DEEE11223

    In other words, they can be any length. I tried using something like this:

    foreach my $pattern (@patterns) { if (substr($fields[7],0,length($pattern)) eq $pattern) { ### Code here last; } }
    but that was slightly slower than the regex methods.
      Ouch.

      How many prefixes of each length, though? If you have many of a given length, maybe you could do something like this (writing support code to generate this HOH, of course):

      my %prefixList=( 3 => {DEF=>1, SRC=>1}, 6 => {ABCD12=>1}, 10 => {DEEEE11223}, ); #.... # inside the loop foreach my $length(%prefixList) { if($prefixList{$length}->{substr($fields[7],0,$length)}){ ### code here last; } }
      If the set of sizes isn't dynamic, you could make that an AOH, with some possibly empty hashes, I guess... You could also get rid of the loop and inline the check for each length, but I don't think that would save you much.

      If you're only ever going to have 1 or 2 prefixes of a given length, then this isn't worth it. But given your initial description, I think with a LOT of prefixes and very few different lengths, this could start to pull ahead of the regex.
      --
      Mike