I am writing a quick script that will parse over my Apache access log and print out the most recent file accesses. The gotcha comes in with the fact that I want it to only print out the most recent file accessed from each unique IP, and I want the output sorted by date.

Here is my code:

#!/usr/bin/perl use warnings; use strict; use Date::Manip; use vars qw(%ipHash); while(<DATA>) { / ^((\d{1,3}\.){3}\d{1,3}) # grab the IP address into $1 \s\-\s\-\s\[ (\d\d\/\w{3}\/\d\d(\d\d\:){3}\d\d) # grab the date into $3 \s\-\d{4}\]\s"\w{1,4}\s ([\/|\w|\.|_]+) # grab the file path into $5 /x and $ipHash{&UnixDate($3,"%s")} = [$1, $3, $5]; } print join "\n", map {$ipHash{$_}[0] . " => " . $ipHash{$_}[1] . "\t" . $ipHash{$_} +[2]} sort keys %ipHash; __DATA__ 209.36.83.252 - - [17/Oct/2002:05:53:17 -0400] "GET /scripts/..%%35%63 +../winnt/system32/cmd.exe?/c+dir HTTP/1.0" 400 303 209.36.83.252 - - [17/Oct/2002:05:53:17 -0400] "GET /scripts/..%%35c.. +/winnt/system32/cmd.exe?/c+dir HTTP/1.0" 400 303 209.36.83.252 - - [17/Oct/2002:05:53:17 -0400] "GET /scripts/..%25%35% +63../winnt/system32/cmd.exe?/c+dir HTTP/1.0" 404 313 209.36.83.252 - - [17/Oct/2002:05:53:17 -0400] "GET /scripts/..%252f.. +/winnt/system32/cmd.exe?/c+dir HTTP/1.0" 404 313 68.9.44.75 - - [17/Oct/2002:06:50:34 -0400] "GET /phpMyAdmin HTTP/1.1" + 301 322 68.9.44.75 - - [17/Oct/2002:06:50:34 -0400] "GET /phpMyAdmin/ HTTP/1.1 +" 200 898 68.9.44.75 - - [17/Oct/2002:06:50:36 -0400] "GET /phpMyAdmin/left.php? +lang=en-iso-8859-1&convcharset=iso-8859-1&server=1 HTTP/1.1" 200 1024 129.22.39.158 - - [17/Oct/2002:18:05:10 -0400] "OPTIONS / HTTP/1.1" 20 +0 0 160.79.211.121 - - [17/Oct/2002:19:51:31 -0400] "GET /default.ida?NNNN +NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN +NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN +NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN +NNNNNNNNNNNNN%u9090%u6858%ucbd3%u7801%u9090%u6858%ucbd3%u7801%u9090%u +6858%ucbd3%u7801%u9090%u9090%u8190%u00c3%u0003%u8b00%u531b%u53ff%u007 +8%u0000%u00=a HTTP/1.0" 400 303 129.22.82.8 - - [17/Oct/2002:20:37:10 -0400] "GET /index.php HTTP/1.1" + 200 25430 129.22.82.8 - - [17/Oct/2002:20:37:10 -0400] "GET /index.php?=PHPE9568 +F35-D428-11d2-A769-00AA001ACF42 HTTP/1.1" 200 4440 129.22.82.8 - - [17/Oct/2002:20:37:10 -0400] "GET /index.php?=PHPE9568 +F34-D428-11d2-A769-00AA001ACF42 HTTP/1.1" 200 2962 129.22.82.8 - - [17/Oct/2002:21:25:44 -0400] "GET / HTTP/1.1" 200 2673 129.22.82.8 - - [17/Oct/2002:21:25:44 -0400] "GET /manual/images/apach +e_pb.gif HTTP/1.1" 404 302
That code produces the following output:
209.36.83.252 => 17/Oct/2002:05:53:17 /scripts/.. 68.9.44.75 => 17/Oct/2002:06:50:34 /phpMyAdmin/ 68.9.44.75 => 17/Oct/2002:06:50:36 /phpMyAdmin/left.php 160.79.211.121 => 17/Oct/2002:19:51:31 /default.ida 129.22.82.8 => 17/Oct/2002:20:37:10 /index.php 129.22.82.8 => 17/Oct/2002:21:25:44 /manual/images/apache_pb.gif
Now, ideally, I would not want the IP's repeated. Rather, I just want to see the last file accessed by that IP. So, the output would look like:
209.36.83.252 => 17/Oct/2002:05:53:17 /scripts/.. 68.9.44.75 => 17/Oct/2002:06:50:34 /phpMyAdmin/ 160.79.211.121 => 17/Oct/2002:19:51:31 /default.ida 129.22.82.8 => 17/Oct/2002:21:25:44 /manual/images/apache_pb.gif
But, to maintain my sorting by date, I key the hash by the Unix timestamp and not the IP. Would I need to set up a dualing hash thing so I can sort by date but keep only one entry for each IP address? I just can't seem to wrap my head around this and wondered if any monks had some nifty ideas.

Thanks,
enoch

P.S. You gotta love those 'default.ida?NNNNNNN' entries.

In reply to Parsing Apache Log to Get Most Recent File Access by enoch

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.