Re: log parsing very slow

I'd suggest using a hash instead of (or in addition to) an array; you'll be able to keep track of things a whole lot faster. The other problem is that for each line of the file, you're compiling X regexes, where X is the number of elements in @mask. I would suggest creating regexes beforehand:

my @regexes = map qr{GET (.*$_.*) HTTP/1.1" 200 [0-9]}, @mask;
[download]

Using qr// gives us pre-compiled regexes, and I've added capturing parens to the regex so that the part in between "GET " and " HTTP" is captured to $1. Read on:

my (%count, @order);

while (<F>) {
  chomp;
  for my $rx (@regexes) {
    if (/$rx/) {
      my $bn = basename($1);
      $count{$bn}++ or push @order, $bn;
    }
  }
}
[download]

The %count hash tells you how many times a particular basename was seen, and the @order array keeps them in the order they were found. The only really tricky line is $count{$bn}++ or push @order, $bn which means: "If $count{$bn} is zero before we increment it, push $bn to the @order array." This means you won't get the same element in @order twice.

Update: in retrospect, there's probably no harm in producing a single regex, since looping over the regexes can't provide any additional hits. That is, if regex 1 finds a match and regex 2 finds the same match, then regex 1+2 together will provide the same results.

my $alts = join '|', map quotemeta, @masks;
my $rx = qr{GET (.*(?:$alts).*) HTTP/1.1" 200 [0-9]};
my (%count, @order);

while (<F>) {
  chomp;
  if (/$rx/) {
    my $bn = basename($1);
    $count{$bn}++ or push @order, $bn;
  }
}
[download]

Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart

Comment on Re: log parsing very slow Select or Download Code

Replies are listed 'Best First'.
Re^2: log parsing very slow by Anonymous Monk on Oct 05, 2005 at 14:16 UTC
holy mother of perl, this is truly jawdropping. the script finishes in mere 12s... and you axed half of my script ;-) as i don't really care about the order of the entries, another array axed. brilliant. it is now clear for me that compiling regexes is more expensive than to fly to paris... i would like to thank the others as well for their time and effort.	[reply]
Re^3: log parsing very slow by japhy (Canon) on Oct 05, 2005 at 14:35 UTC
The two problems worked in tandem to screw you over bigtime. With more than one keyword being used, you had to compile a regex for EVERY line of the file and EVERY keyword. 200 lines and 2 keywords means 400 regex compilations (although you only really need TWO regexes). And using an array instead of a hash to keep track of uniqueness is, well, the road to madness. Jeff `japhy` Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and `perl` hacker How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart	[reply]