in reply to log parsing very slow

I'd suggest using a hash instead of (or in addition to) an array; you'll be able to keep track of things a whole lot faster. The other problem is that for each line of the file, you're compiling X regexes, where X is the number of elements in @mask. I would suggest creating regexes beforehand:
my @regexes = map qr{GET (.*$_.*) HTTP/1.1" 200 [0-9]}, @mask;
Using qr// gives us pre-compiled regexes, and I've added capturing parens to the regex so that the part in between "GET " and " HTTP" is captured to $1. Read on:
my (%count, @order); while (<F>) { chomp; for my $rx (@regexes) { if (/$rx/) { my $bn = basename($1); $count{$bn}++ or push @order, $bn; } } }
The %count hash tells you how many times a particular basename was seen, and the @order array keeps them in the order they were found. The only really tricky line is $count{$bn}++ or push @order, $bn which means: "If $count{$bn} is zero before we increment it, push $bn to the @order array." This means you won't get the same element in @order twice.

Update: in retrospect, there's probably no harm in producing a single regex, since looping over the regexes can't provide any additional hits. That is, if regex 1 finds a match and regex 2 finds the same match, then regex 1+2 together will provide the same results.

my $alts = join '|', map quotemeta, @masks; my $rx = qr{GET (.*(?:$alts).*) HTTP/1.1" 200 [0-9]}; my (%count, @order); while (<F>) { chomp; if (/$rx/) { my $bn = basename($1); $count{$bn}++ or push @order, $bn; } }

Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart

Replies are listed 'Best First'.
Re^2: log parsing very slow
by Anonymous Monk on Oct 05, 2005 at 14:16 UTC
    holy mother of perl, this is truly jawdropping.

    the script finishes in mere 12s... and you axed half of my script ;-)
    as i don't really care about the order of the entries, another array axed. brilliant.

    it is now clear for me that compiling regexes is more expensive than to fly to paris...

    i would like to thank the others as well for their time and effort.

      The two problems worked in tandem to screw you over bigtime. With more than one keyword being used, you had to compile a regex for EVERY line of the file and EVERY keyword. 200 lines and 2 keywords means 400 regex compilations (although you only really need TWO regexes). And using an array instead of a hash to keep track of uniqueness is, well, the road to madness.

      Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
      How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart