http://qs1969.pair.com?node_id=315089

CHRYSt has asked for the wisdom of the Perl Monks concerning the following question:

Ok, This seems simple, but I can't find any code out there that already does what I need, and I'm still not very good with perl.

what I need to do is generate a report once a month that shows the number of hits to specific directories. Basically, all I need is:

http://www.domain.com/directory/subdir1/ 500 hits
http://www.domain.com/directory/subdir2/ 550 hits
etc.

I've looked into Apache::ParseLog, but it doesn't work. Every time I try to open my log file ( using $log = $base->getCustomLog("combined"); ), I get

Apache::ParseLog::getCustomLog: combined does not exist, Exiting at report.pl line 10.

Can anyone help?

Replies are listed 'Best First'.
Re: Yet another Apache log question
by barrd (Canon) on Dec 16, 2003 at 17:35 UTC
    Hi CHRYst,
    Apache::ParseLog::getCustomLog: combined does not exist...
    Did you put in a full path to the 'combined' log? i.e. something along the lines of:
    $log = $base->getCustomLog('/var/log/apache/combined');

    That, I think could be the problem...?

      Unfortunately not.

      The getCustomLog("value") requires the nickname of the log. It appears that it parses the apache conf file, and determines the log path based on the CustomLog directive that states "combined" as it's nickname.
        Aah... pooh ;)

        OK, could you post the CustomLog directive line from your conf file so we can have a look?

        Ta

Re: Yet another Apache log question
by tachyon (Chancellor) on Dec 17, 2003 at 02:09 UTC

    Here is how to do it by hand. You need to find the logs (try locate access_log if they are not in the usual place). A standard log line looks like

    blah blah blah GET /some/path/to/file.htm blah blah

    So all we do is get a file list from glob(), iterate over it, read each file, use a RE to get the bit after the GET or POST and count in a hash. Then we print it out.

    [root@devel3 root]# cat ./simple_log_parse.pl #!/usr/bin/perl -w use strict; my $LOG_PATH = '/var/log/httpd/access_log'; my @FIND = qw( /modperl/ /cgi-bin/ /images/ ); my $re = join '|', map{quotemeta}@FIND; $re = qr/$re/; my @logs = glob("$LOG_PATH*"); my %hash; my $total = 0; for my $log(@logs) { print "Processing $log\n"; open LOG, $log or die "Can't open $log $!\n"; while (<LOG>) { $total++; next unless m/(?:GET|POST) ($re)/; $hash{$1}++; } close LOG; } print "\n\nResults\n"; for ( keys %hash ) { printf "%-20s %8d/%-8d (%.2f%%)\n", $_, $hash{$_}, $total, (100*$h +ash{$_}/$total); } [root@devel3 root]# ./simple_log_parse.pl Processing /var/log/httpd/access_log Processing /var/log/httpd/access_log.1 Processing /var/log/httpd/access_log.2 Processing /var/log/httpd/access_log.3 Processing /var/log/httpd/access_log.4 Processing /var/log/httpd/access_log.5 Processing /var/log/httpd/access_log.6 Results /images/ 170212/376847 (45.17%) /modperl/ 186210/376847 (49.41%) /cgi-bin/ 5366/376847 (1.42%) [root@devel3 root]#

    Here is a variation on the theme that does all your paths, and presents them sorted by hits:

    [root@devel3 root]# cat ./simple_log_parse2.pl #!/usr/bin/perl -w use strict; my $LOG_PATH = '/var/log/httpd/access_log'; my @logs = glob("$LOG_PATH*"); my %hash; my $total = 0; for my $log(@logs) { print "Processing $log\n"; open LOG, $log or die "Can't open $log $!\n"; while (<LOG>) { $total++; next unless m/(?:GET|POST) ([^\s]+)/; my $path = $1; ($path) = split /\?/, $path; $path =~ s![^/]+$!!; $hash{$path}++; } close LOG; } print "\n\nResults\n"; for ( sort { $hash{$b} <=> $hash{$a} } keys %hash ) { printf "%-20s %8d/%-8d (%.2f%%)\n", $_, $hash{$_}, $total, (100*$h +ash{$_}/$total); } [root@devel3 root]#

    PS There is stacks of log analysis software that will do a far more complete job.

    cheers

    tachyon