Dear fellow monks,

I am writing a simple parser application for parsing tomcat localhost_access.log file to get the number or http requests , number of unique files requested etc, and here is the code snippet,

use Data::Dumper::Simple; print "Welcome to Tomcat Access Log Parser\n"; $LogFile = "C:\\Documents and Settings\\snra\\Desktop\\localhost_access_log.txt +"; my $http_code_200_count,$http_code_404_count,$unique_ip_count,$uniqu +e_ip_journal_mf; my @response_array = (); use vars qw/ $element_host $element_logname $element_date $element_method $element_url $element_code $element_size $element_referrer $element_agent /; my @field = (); @attriubtelib = ( 'host', 'logname', 'date', 'method', 'url', 'code', 'size', 'referrer', 'agent' ); $TomcatLogFormat = ''; # $PerlParsingFormat="([^ ]+) [^ ]+ ([^\\/\\[]+) \\[([^ ]+) [^ ]+\\] \ +\\"([^ ]+) (.+) [^\\\"]+\\\" ([\\d|-]+) ([\\d|-]+) \\\"(.*?)\\\" \\\" +([^\\\"]*)\\\""; $TomcatLogFormat = "([^ ]+) [^ ]+ ([^\\/\\[]+) \\[([^ ]+) [^ ]+\\] \\\"([^ ]+) ([^ ]+)(?: + [^\\\"]+|)\\\" ([\\d|-]+) ([\\d|-]+) \\\"(.*?)\\\" \\\"([^\\\"]*)\\\ +""; $element_host = 0; $element_logname = 1; $element_date = 2; $element_method = 3; $element_url = 4; $element_code = 5; $element_size = 6; $element_referrer = 7; $element_agent = 8; $TomcatLogFormat = qr/^$TomcatLogFormat/; open( LOG, "$LogFile" ) || die("Couldn't open tomcat access log file \"$LogFile\" : $!"); while ( $line = <LOG> ) { my %single_response_map = (); chomp $line; $line =~ s/\r$//; if ( !( @field = map( /$TomcatLogFormat/, $line ) ) ) { print "Line did not match the format !!!! - $line "; } for ( $i = 0 ; $i < @field ; ++$i ) { $single_response_map{ $attriubtelib[$i] } = $field[$i]; } push @response_array,\%single_response_map; } print Dumper @response_array;

This will dump an array of hashes. For example each element of array will look like ,

{ 'date' => '15/Dec/2010:12:36:21', 'referrer' => '-', 'size' => '1046', 'host' => '192.0.0.222', 'logname' => '-', 'agent' => 'Mozilla/4.0 (compatible; MSIE 6.0; W +indows NT 5.1; SV1)', 'url' => '/folder/File1/data1.txt', 'method' => 'GET', 'code' => '200' }

Each hash corresponds to a http request in log file. I need to get the count of unique IP addresses , count of unique URLs and count of unique requests based on different http status codes. Please help me in getting the counts from this hashmap using any quick methods. Else i will have to iterate individually with many flags in place.

Thanks in Advance.


In reply to Parsing tomcat access log by snra_perl

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.