in reply to log parsing very slow
becomesforeach (@count) { if ($_->[0] eq $bn) { $_->[1]++; $found = 1; last; } } if ($found == 0) { push @count, [ $bn, 1 ]; }
Next thing: Precompile your RE and do it for all masks:++$count{$bn};
(Untested!)my $regex = join '|', @mask; # Now you have a list of alternatives $regex = qr<GET (.*\b(?:$regex)\b.*) HTTP/1.1" 200 [0-9].*>; # now the regex is precompiled
is reduced toif (/$regex/) { s/.*GET //; s/ HTTP.*//; : :
Putting it all together and eliminating the unneccessary replace and chomp will give you this script, that should (UNTESTED) give the same result as yours, except for the sequence of filenames which will be sorted:if (s/$regex/$1/) { : :
use strict; use File::Basename; my %count; my @mask = ( '/some/path/', 'some/other/path', 'another/path', ); my $regex = join '|', @mask; # Now you have a list of alternatives $regex = qr<GET (.*\b(?:$regex)\ +b.*) HTTP/1.1" 200 [0-9].*> # now the regex is precompiled # N.B. I added \b to make the filename only match on word boundaries. # So some/path won't match anothersome/path # but it will match another/some/path open F, "access.log"; while (<F>) { if (/$regex/) { ++$count{basename($1)}; } } foreach (sort keys %count) { print $_," = ",$count{$_},"\n"; }
|
---|