snra_perl has asked for the wisdom of the Perl Monks concerning the following question:
I am writing a simple parser application for parsing tomcat localhost_access.log file to get the number or http requests , number of unique files requested etc, and here is the code snippet,
use Data::Dumper::Simple; print "Welcome to Tomcat Access Log Parser\n"; $LogFile = "C:\\Documents and Settings\\snra\\Desktop\\localhost_access_log.txt +"; my $http_code_200_count,$http_code_404_count,$unique_ip_count,$uniqu +e_ip_journal_mf; my @response_array = (); use vars qw/ $element_host $element_logname $element_date $element_method $element_url $element_code $element_size $element_referrer $element_agent /; my @field = (); @attriubtelib = ( 'host', 'logname', 'date', 'method', 'url', 'code', 'size', 'referrer', 'agent' ); $TomcatLogFormat = ''; # $PerlParsingFormat="([^ ]+) [^ ]+ ([^\\/\\[]+) \\[([^ ]+) [^ ]+\\] \ +\\"([^ ]+) (.+) [^\\\"]+\\\" ([\\d|-]+) ([\\d|-]+) \\\"(.*?)\\\" \\\" +([^\\\"]*)\\\""; $TomcatLogFormat = "([^ ]+) [^ ]+ ([^\\/\\[]+) \\[([^ ]+) [^ ]+\\] \\\"([^ ]+) ([^ ]+)(?: + [^\\\"]+|)\\\" ([\\d|-]+) ([\\d|-]+) \\\"(.*?)\\\" \\\"([^\\\"]*)\\\ +""; $element_host = 0; $element_logname = 1; $element_date = 2; $element_method = 3; $element_url = 4; $element_code = 5; $element_size = 6; $element_referrer = 7; $element_agent = 8; $TomcatLogFormat = qr/^$TomcatLogFormat/; open( LOG, "$LogFile" ) || die("Couldn't open tomcat access log file \"$LogFile\" : $!"); while ( $line = <LOG> ) { my %single_response_map = (); chomp $line; $line =~ s/\r$//; if ( !( @field = map( /$TomcatLogFormat/, $line ) ) ) { print "Line did not match the format !!!! - $line "; } for ( $i = 0 ; $i < @field ; ++$i ) { $single_response_map{ $attriubtelib[$i] } = $field[$i]; } push @response_array,\%single_response_map; } print Dumper @response_array;
This will dump an array of hashes. For example each element of array will look like ,
{ 'date' => '15/Dec/2010:12:36:21', 'referrer' => '-', 'size' => '1046', 'host' => '192.0.0.222', 'logname' => '-', 'agent' => 'Mozilla/4.0 (compatible; MSIE 6.0; W +indows NT 5.1; SV1)', 'url' => '/folder/File1/data1.txt', 'method' => 'GET', 'code' => '200' }
Each hash corresponds to a http request in log file. I need to get the count of unique IP addresses , count of unique URLs and count of unique requests based on different http status codes. Please help me in getting the counts from this hashmap using any quick methods. Else i will have to iterate individually with many flags in place.
Thanks in Advance.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Parsing tomcat access log
by marto (Cardinal) on Dec 16, 2010 at 16:01 UTC | |
|
Re: Parsing tomcat access log
by Anonyrnous Monk (Hermit) on Dec 16, 2010 at 15:10 UTC |