Hey, guys, thanks!!! This is a wonderful resource, and I incorporated some suggestions into the revised script below. I still have some questions, though!

BrowserUk, I decided against using Date:Manip even though I really like that module. That's because the module instructions warn that it's slower than other time modules and this script will be used most often when the web server is overloaded with requests; thus, speed is essential.

Abigail-II, a database would be nice, but the server is producing regular logs, so that's what I have to use.

In the following script, here are my questions:

1) Using strict produces errors that I don't have a global module loaded; what module is that?

2) The simulated $month switch statement doesn't work as expected; instead of values 0 through 11, it gives everything a value of 1. Getting it changed to a number makes timelocal accurate.

3. At the end, I pack all the referrers into an array; what I need to do is count each referrer as an unique URL, so that www.you.com is counted x times and www.me.com is counted y times so I can then tell the top referrer in the time period stipulated by the web page (which just has hours and minutes to enter). That will let me create output like
www.you.com 22
www.me.com 19
etc
How can I count an unknown value and produce this output? And is an array the best way to do it?

Any and all ideas welcome, and thanks in advance. I really appreciate the help!

Here's the script, followed by some raw log data:

#!/usr/local/bin/perl #use strict; use CGI qw(:standard); use CGI::Carp qw(fatalsToBrowser carpout); use Time::Local; # Grab information returned by web page $hour = param ("hour"); $minute = param ("minute"); # Allow perl to write to browser window print "Content-type: text/html\n\n"; # Current time in seconds $now = time; # Convert submitted time to seconds $compare_time = ($hour * 3600) + ($minute * 60); # Times extracted by logs must be >= to $target $target = $now - $compare_time; open LOGFILE, "datafile.html" || die "Can't open file"; @log_data =<LOGFILE>; # Grab useful information from each line of the web log foreach $log_line(@log_data) { # Grab date/time and referer ($date_string, $referrer) = ($log_line =~ /\[([^\]]+)\] "[^"]+"[^"] ++"([^"]+)"/); # Replace / and : with spaces $date_string =~ s!/! !g; $date_string =~ s!:! !g; # Dump junk at end of line $date_string =~ s! -[0-9]+!!; # Split date/time into useful information ($day, $month, $year, $hhour, $min, $sec) = split(' ', $date_string +); # Convert month from text to number if ($month == 'Jan') {$month = 0} elsif ($month == 'Feb') {$month = 1} elsif ($month == 'Mar') {$month = 2} elsif ($month == 'Apr') {$month = 3} elsif ($month == 'May') {$month = 4} elsif ($month == 'Jun') {$month = 5} elsif ($month == 'Jul') {$month = 6} elsif ($month == 'Aug') {$month = 7} elsif ($month == 'Sep') {$month = 8} elsif ($month == 'Oct') {$month = 9} elsif ($month == 'Nov') {$month = 10} else {$month = 11} # Calculate time on the log line in seconds $log_time = timelocal($sec,$min,$hhour,$day,$month,$year); if ($log_time >= $target) { push @refers, $referrer; } }

Some data:

216.45.43.42 - - [12/Dec/2002:18:39:15 -0500] "GET /news/opinions/varv +el.gif HTTP/1.1" 302 313 "http://www.freerepublic.com/forum/a3a95ca3c +24a0.htm" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CL +R 1.0.3705)" 12.222.75.65 - - [12/Dec/2002:18:39:15 -0500] "GET /images/header_aod2 +_15.gif HTTP/1.1" 200 4162 "http://www.indystar.com/print/articles/1/ +007735-7671-036.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; + Win 9x 4.90; MSOCD; Q312461; YComp 5.0.0.0; .NET CLR 1.0.3705)" 12.222.75.65 - - [12/Dec/2002:18:39:15 -0500] "GET /images/storysearch +2.gif HTTP/1.1" 200 142 "http://www.indystar.com/print/articles/1/007 +735-7671-036.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Wi +n 9x 4.90; MSOCD; Q312461; YComp 5.0.0.0; .NET CLR 1.0.3705)" 12.222.75.65 - - [12/Dec/2002:18:39:15 -0500] "GET /users/ads/misc/rem +ax_searchad3.gif HTTP/1.1" 200 2335 "http://www.indystar.com/print/ar +ticles/1/007735-7671-036.html" "Mozilla/4.0 (compatible; MSIE 6.0; Wi +ndows 98; Win 9x 4.90; MSOCD; Q312461; YComp 5.0.0.0; .NET CLR 1.0.37 +05)" 12.222.75.65 - - [12/Dec/2002:18:39:16 -0500] "GET /images/sports_03_a +od.gif HTTP/1.1" 200 3195 "http://www.indystar.com/print/articles/1/0 +07735-7671-036.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; +Win 9x 4.90; MSOCD; Q312461; YComp 5.0.0.0; .NET CLR 1.0.3705)" 12.222.75.65 - - [12/Dec/2002:18:39:16 -0500] "GET /images/email.gif H +TTP/1.1" 200 138 "http://www.indystar.com/print/articles/1/007735-767 +1-036.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4. +90; MSOCD; Q312461; YComp 5.0.0.0; .NET CLR 1.0.3705)" 12.222.75.65 - - [12/Dec/2002:18:39:16 -0500] "GET /images/print.gif H +TTP/1.1" 200 139 "http://www.indystar.com/print/articles/1/007735-767 +1-036.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4. +90; MSOCD; Q312461; YComp 5.0.0.0; .NET CLR 1.0.3705)" 12.222.75.65 - - [12/Dec/2002:18:39:16 -0500] "GET /images/sidelinksen +d2.gif HTTP/1.1" 200 1009 "http://www.indystar.com/print/articles/1/0 +07735-7671-036.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; +Win 9x 4.90; MSOCD; Q312461; YComp 5.0.0.0; .NET CLR 1.0.3705)" 12.222.75.65 - - [12/Dec/2002:18:39:16 -0500] "GET /images/pics2/image +-007735-7410.jpg HTTP/1.1" 200 18319 "http://www.indystar.com/print/a +rticles/1/007735-7671-036.html" "Mozilla/4.0 (compatible; MSIE 6.0; W +indows 98; Win 9x 4.90; MSOCD; Q312461; YComp 5.0.0.0; .NET CLR 1.0.3 +705)" 12.222.75.65 - - [12/Dec/2002:18:39:16 -0500] "GET /images/advertiseme +nt_250strip.gif HTTP/1.1" 200 238 "http://www.indystar.com/print/arti +cles/1/007735-7671-036.html" "Mozilla/4.0 (compatible; MSIE 6.0; Wind +ows 98; Win 9x 4.90; MSOCD; Q312461; YComp 5.0.0.0; .NET CLR 1.0.3705 +)" 12.222.75.65 - - [12/Dec/2002:18:39:17 -0500] "GET /users/ads/story/ma +cselect/macselect_250_Oct.gif HTTP/1.1" 200 10436 "http://www.indysta +r.com/print/articles/1/007735-7671-036.html" "Mozilla/4.0 (compatible +; MSIE 6.0; Windows 98; Win 9x 4.90; MSOCD; Q312461; YComp 5.0.0.0; . +NET CLR 1.0.3705)"

update (broquaint): changed <pre> tags to <code> tags


In reply to Re: Re: pulling by regex by mkent
in thread pulling by regex by mkent

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.