I'm a newbie with a deadline, so any help appreciated. I need to write a program with a web interface where periods of time can be specified (last 2 hours, last 24 hours, last 2 hours and 15 mins)and then the web log read to fetch all entries matching that time period, find the referrers and add them up to display the top 10 referrers, in order.

As a first step, I'm trying to pull out the time and http referrer from web log data, but it's not going well since the only way I can see to do it is to strip out the unwanted parts of the log line and then use the timelocal function to convert the log time to real time to match whatever math is done to the current time. Here's what I have so far as a test:

#!/usr/local/bin/perl use CGI qw(:standard); use CGI::Carp qw(fatalsToBrowser carpout); use Time::Local; print "Content-type: text/html\n\n"; #$time = timelocal($sec,$min,$hour,$mday,$mon,$year); open LOGFILE, "datafile.html"; @log_data = <LOGFILE>; foreach $log_line(@log_data) { $log_line =~ s/.*(left square bracket)/ /; $log_line =~ s/"GET.*"h/ /; $log_line =~ s/".*/ /; print $log_line, "<p>"; } <p>
The last $log_line does not work.

The datafile.html contains data in this form (square brackets are around the underlined date/times):

24.208.200.247 - - [10/Dec/2002:18:05:09 -0500] "GET /images/header_ao +d2_08.gif HTTP/1.0" 200 663 "http://www.indystar.com/help/help/availa +ble.html" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; H010818)" 24.208.200.247 - - [10/Dec/2002:18:05:09 -0500] "GET /images/header_ao +d2_10.gif HTTP/1.0" 304 - "http://www.indystar.com/help/help/availabl +e.html" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; H010818)" 24.208.200.247 - - [10/Dec/2002:18:05:09 -0500] "GET /images/storysear +ch2.gif HTTP/1.0" 200 142 "http://www.indystar.com/help/help/availabl +e.html" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; H010818)"

In reply to pulling by regex by mkent

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.