Category: CGI
Author/Contact Info George_Sherston
Description: I needed to be able to look at the access logs for the small web sites I run, and there's not enough traffic that I need the all-singing, all-dancing model, I just wanted something that extract the lines I was interested in from the access_log file and show them in a helpful way in the browser. I reckoned it wd take me as long to find one as to write one, and writing one wd be marginally more fun and give me the chance to get another code critique from friendly monks, so here it is. And maybe in the future there'll be someone else who wants exactly this level of simplicity, and if you're that person, look at the minimal comments to see how to 'configure' (if that isn't too grand a term) and you're off to the races. Enjoy!
#!/usr/bin/perl -w

use strict;
use CGI qw(:standard);
my $domain;

&get_domain;
&do_log if defined ($domain = param('Domain')); 

sub
get_domain
{
    my $title;
    if ($domain) {$title = "access log for domain $domain"}
    else {$title = 'access log: select domain'}
    print
        header,
        start_html(-title=>'access log: select domain'),
        startform(
            -method=>'POST',
            -action=>'access.pl',
        ),
        hr,
        'Enter Domain Name: ',
        textfield('Domain'),
        br,br,
        submit,
        endform,
        hr,
        end_html;
}

sub
do_log 
{
    my $file = '/home/htdocs/logs/access_log';    # put your log file 
+name and path here
    my @ignores = (    # suppress log entries that match any of these:
        '^62\.253\.128\.5',
        'Scooter_trk',
        'css HTTP',
        'bmp HTTP',
        'gif HTTP',
        'htm HTTP',
        'jpg HTTP',
        'robots\.txt',
        'http:\/\/blogdex\.media\.mit\.edu\/\/',
    );
    my $ignores = join ("|",@ignores);
    my @log;
    open LG, $file or die "can't open $file $!";
    while (<LG>) {
        next if /$ignores/;
        unshift @log, $_ if /$domain/;
    }
    my $count = 0;
    my $lastlog;
    while ($count < @log) {
        $log[$count] =~ /^(.*?) - - \[(.*?)].*?"GET \/(.*?) HTTP.*?".*
+?"(.*?)".*?".*?" (.*)$/;
        if ($lastlog ne $1) {
            print 
                hr,
                substr ($2, 0, 17)," &nbsp;&nbsp;&nbsp;&nbsp; ",$1,
                br;
        }
        $lastlog = $1;
        print "&nbsp;&nbsp;",b("$5/$3");
        print "&nbsp;&nbsp;(from $4)" unless $4 =~ /$5/;
        print br;
        $count ++;
    }
}
Replies are listed 'Best First'.
Re: Access Log Reader
by dws (Chancellor) on Oct 14, 2001 at 08:15 UTC
    A couple of suggestions
    • Keep configuration details at the top of the script, rather than expecting people to read all of the code to find where they need to change things like file paths.
    • Anticipate large logs. Assuming you can fit all of most of a log into memory works fine until you get slashdotted.
    • Test to see if your regex does, in fact, match. There are fringe cases that I've seen show up in my logs that would break the regexp you're using.
    • Assuming that the "userid" field will be "-" works almost all of the time. When it doesn't, don't assume that the userid won't contain a blank.
      Thanks for those comments, particularly the first one. This puzzled me: I originally put the config vars, as you suggest, at the top of the script. But then I thought I shouldn't do that, because the received wisdom seems to be that one shouldn't allow proliferation of what are effectively global variables. I suppose in a script this size that doesn't matter. But actually, what this brings home to me is that I don't really understand why it is a bad idea to proliferate globals defined for the scope of the whole script. So I don't know how to balance this against the clarity advantages of having them at the top. Any comments on this topic gratefully received.

      § George Sherston
        This puzzled me: I originally put the config vars, as you suggest, at the top of the script. But then I thought I shouldn't do that, because the received wisdom seems to be that one shouldn't allow proliferation of what are effectively global variables.

        The general wisdom is that one shouldn't proliferate global state. Configuration items aren't stateful. They are essentially read-only, named constants. Non-constant global variables, particularly when they're set frequently from all over a program, can make that program wickedly hard to understand. Not so with constants.

        If you want to be doubly righteous, you can use constant to prevent someone from accidentally writing to your configuration items.

Re: Access Log Reader
by George_Sherston (Vicar) on Oct 14, 2001 at 00:43 UTC
    Update: other ways to do the same thing

    ajt /msg'd me to say that www.analog.cx gives away a fully-featured log analyser - and indeed they do. It looks very good and I shall upgrade to it when / if traffic on my sites warrants.

    jryan has written a slightly specialised log analyser with an interesting take on sorting the entries, which instead of placing in the code catacombs he has modestly hidden here.

    I'll add other such links here as they come in... as no doubt they will :)

    § George Sherston