snra_perl has asked for the wisdom of the Perl Monks concerning the following question:

Dear fellow monks,

I am writing a simple parser application for parsing tomcat localhost_access.log file to get the number or http requests , number of unique files requested etc, and here is the code snippet,

use Data::Dumper::Simple; print "Welcome to Tomcat Access Log Parser\n"; $LogFile = "C:\\Documents and Settings\\snra\\Desktop\\localhost_access_log.txt +"; my $http_code_200_count,$http_code_404_count,$unique_ip_count,$uniqu +e_ip_journal_mf; my @response_array = (); use vars qw/ $element_host $element_logname $element_date $element_method $element_url $element_code $element_size $element_referrer $element_agent /; my @field = (); @attriubtelib = ( 'host', 'logname', 'date', 'method', 'url', 'code', 'size', 'referrer', 'agent' ); $TomcatLogFormat = ''; # $PerlParsingFormat="([^ ]+) [^ ]+ ([^\\/\\[]+) \\[([^ ]+) [^ ]+\\] \ +\\"([^ ]+) (.+) [^\\\"]+\\\" ([\\d|-]+) ([\\d|-]+) \\\"(.*?)\\\" \\\" +([^\\\"]*)\\\""; $TomcatLogFormat = "([^ ]+) [^ ]+ ([^\\/\\[]+) \\[([^ ]+) [^ ]+\\] \\\"([^ ]+) ([^ ]+)(?: + [^\\\"]+|)\\\" ([\\d|-]+) ([\\d|-]+) \\\"(.*?)\\\" \\\"([^\\\"]*)\\\ +""; $element_host = 0; $element_logname = 1; $element_date = 2; $element_method = 3; $element_url = 4; $element_code = 5; $element_size = 6; $element_referrer = 7; $element_agent = 8; $TomcatLogFormat = qr/^$TomcatLogFormat/; open( LOG, "$LogFile" ) || die("Couldn't open tomcat access log file \"$LogFile\" : $!"); while ( $line = <LOG> ) { my %single_response_map = (); chomp $line; $line =~ s/\r$//; if ( !( @field = map( /$TomcatLogFormat/, $line ) ) ) { print "Line did not match the format !!!! - $line "; } for ( $i = 0 ; $i < @field ; ++$i ) { $single_response_map{ $attriubtelib[$i] } = $field[$i]; } push @response_array,\%single_response_map; } print Dumper @response_array;

This will dump an array of hashes. For example each element of array will look like ,

{ 'date' => '15/Dec/2010:12:36:21', 'referrer' => '-', 'size' => '1046', 'host' => '192.0.0.222', 'logname' => '-', 'agent' => 'Mozilla/4.0 (compatible; MSIE 6.0; W +indows NT 5.1; SV1)', 'url' => '/folder/File1/data1.txt', 'method' => 'GET', 'code' => '200' }

Each hash corresponds to a http request in log file. I need to get the count of unique IP addresses , count of unique URLs and count of unique requests based on different http status codes. Please help me in getting the counts from this hashmap using any quick methods. Else i will have to iterate individually with many flags in place.

Thanks in Advance.

Replies are listed 'Best First'.
Re: Parsing tomcat access log
by marto (Cardinal) on Dec 16, 2010 at 16:01 UTC

    Have you considered looking at what awstats has to offer? It's written in Perl.

Re: Parsing tomcat access log
by Anonyrnous Monk (Hermit) on Dec 16, 2010 at 15:10 UTC
    I need to get the count of unique IP addresses , count of unique URLs and...

    For counting unique things, it's usually more appropriate to use the item of interest (such as the URLs) as the hash's key...

    After having put all items in the hash, the number of keys indicates the number of unique items.

    my @things = qw(foo bar foo baz baz); my %hash; $hash{$_}++ for @things; print scalar keys %hash; # 3