Here is how to do it by hand. You need to find the logs (try locate access_log if they are not in the usual place). A standard log line looks like
blah blah blah GET /some/path/to/file.htm blah blah
So all we do is get a file list from glob(), iterate over it, read each file, use a RE to get the bit after the GET or POST and count in a hash. Then we print it out.
[root@devel3 root]# cat ./simple_log_parse.pl
#!/usr/bin/perl -w
use strict;
my $LOG_PATH = '/var/log/httpd/access_log';
my @FIND = qw(
/modperl/
/cgi-bin/
/images/
);
my $re = join '|', map{quotemeta}@FIND;
$re = qr/$re/;
my @logs = glob("$LOG_PATH*");
my %hash;
my $total = 0;
for my $log(@logs) {
print "Processing $log\n";
open LOG, $log or die "Can't open $log $!\n";
while (<LOG>) {
$total++;
next unless m/(?:GET|POST) ($re)/;
$hash{$1}++;
}
close LOG;
}
print "\n\nResults\n";
for ( keys %hash ) {
printf "%-20s %8d/%-8d (%.2f%%)\n", $_, $hash{$_}, $total, (100*$h
+ash{$_}/$total);
}
[root@devel3 root]# ./simple_log_parse.pl
Processing /var/log/httpd/access_log
Processing /var/log/httpd/access_log.1
Processing /var/log/httpd/access_log.2
Processing /var/log/httpd/access_log.3
Processing /var/log/httpd/access_log.4
Processing /var/log/httpd/access_log.5
Processing /var/log/httpd/access_log.6
Results
/images/ 170212/376847 (45.17%)
/modperl/ 186210/376847 (49.41%)
/cgi-bin/ 5366/376847 (1.42%)
[root@devel3 root]#
Here is a variation on the theme that does all your paths, and presents them sorted by hits:
[root@devel3 root]# cat ./simple_log_parse2.pl
#!/usr/bin/perl -w
use strict;
my $LOG_PATH = '/var/log/httpd/access_log';
my @logs = glob("$LOG_PATH*");
my %hash;
my $total = 0;
for my $log(@logs) {
print "Processing $log\n";
open LOG, $log or die "Can't open $log $!\n";
while (<LOG>) {
$total++;
next unless m/(?:GET|POST) ([^\s]+)/;
my $path = $1;
($path) = split /\?/, $path;
$path =~ s![^/]+$!!;
$hash{$path}++;
}
close LOG;
}
print "\n\nResults\n";
for ( sort { $hash{$b} <=> $hash{$a} } keys %hash ) {
printf "%-20s %8d/%-8d (%.2f%%)\n", $_, $hash{$_}, $total, (100*$h
+ash{$_}/$total);
}
[root@devel3 root]#
PS There is stacks of log analysis software that will do a far more complete job.
|