in reply to log parsing very slow
Using qr// gives us pre-compiled regexes, and I've added capturing parens to the regex so that the part in between "GET " and " HTTP" is captured to $1. Read on:my @regexes = map qr{GET (.*$_.*) HTTP/1.1" 200 [0-9]}, @mask;
The %count hash tells you how many times a particular basename was seen, and the @order array keeps them in the order they were found. The only really tricky line is $count{$bn}++ or push @order, $bn which means: "If $count{$bn} is zero before we increment it, push $bn to the @order array." This means you won't get the same element in @order twice.my (%count, @order); while (<F>) { chomp; for my $rx (@regexes) { if (/$rx/) { my $bn = basename($1); $count{$bn}++ or push @order, $bn; } } }
Update: in retrospect, there's probably no harm in producing a single regex, since looping over the regexes can't provide any additional hits. That is, if regex 1 finds a match and regex 2 finds the same match, then regex 1+2 together will provide the same results.
my $alts = join '|', map quotemeta, @masks; my $rx = qr{GET (.*(?:$alts).*) HTTP/1.1" 200 [0-9]}; my (%count, @order); while (<F>) { chomp; if (/$rx/) { my $bn = basename($1); $count{$bn}++ or push @order, $bn; } }
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: log parsing very slow
by Anonymous Monk on Oct 05, 2005 at 14:16 UTC | |
by japhy (Canon) on Oct 05, 2005 at 14:35 UTC |