in reply to Extract the lines
When comparing date/times you need to convert to something that can be compared. There are two basic options:
1. convert to epoch time (a huge binary number) and you use numeric compares on that number
2. convert to a text representation that allows you to use string compares.
Below I show method(2) because if you have any influence upon the format of this log file, this is a HUGE hint on what would be better!
A string like "2010-08-22 01:00:00" can be compared with string le,gt,cmp functions to "2010-08-21 06:00:00" without calling any Perl module or function. And it is "human readable" as opposed to an integer epoch time. Please note that leading zeroes are important in this type of format!
Below I just showed one way to do a format conversion like this. I didn't spend a billion hours making this as efficient as possible. Just trying to demonstrate the idea.
I think your "sort" was just wasted CPU MIP's. Write code that processes the file, use the reformat_date_time() subroutine to get the reformatted date/time for that line and look for lines that are gt or eq "2010-08-21 06:00:00" and lt or eq "2010-08-22 09:00:00" using string compare functions.
If was doing some huge sort, I would be tempted and probably would convert times to epoch values to speed up the compares in the sort. But here you are going to "touch" each input line exactly once to convert the date/time info into a better string, and then either save that line or not.
#!/usr/bin/perl -w use strict; my %month2numstring = (Jan => '01', Feb => '02', Mar => '03', Apr => '04', May => '05', Jun => '06', Jul => '06', Aug => '08', Sep => '09', Oct => '10', Nov => '11', Dec => '12', ); while (<DATA>) { my $datefield = (split)[3]; my ($datestring) = $datefield=~ m|\[([\w/:]+)|; my $new_date_field = reformat_date_time($datestring); print "$new_date_field\n"; } sub reformat_date_time { my $date_time = shift; my ($date,$time) = m|([\w/]+):([\d:]+)|; my ($day,$month_text,$year) = split(m|/|,$date); $day = "0$1" if $day =~ m|^(\d)$|; #force leading zero my $month = $month2numstring{$month_text}; return ($year.'-'.$month.'-'.$day." $time"); } =prints 2010-08-21 00:00:00 2010-08-21 00:01:00 2010-08-22 01:00:00 2010-08-22 02:00:00 2010-08-22 03:04:00 =cut __DATA__ 67.162.10.216 - - [21/Aug/2010:00:00:00 +0000] GET /2 +010-08-18/news/ct-met-barrington-student-death-20100818_1_mental-illn +ess-suicide-prevention-teen-suicides HTTP/1.1 200 6826 67.162.10.216 - - [21/Aug/2010:00:01:00 +0000] GET /2 +010-08-18/news/ct-met-barrington-student-death-20100818_1_mental-illn +ess-suicide-prevention-teen-suicides HTTP/1.1 200 6826 67.162.10.216 - - [22/Aug/2010:01:00:00 +0000] GET /t +racker.js.php?45aa01ed37b58d2a537b1ba12bb97fe2e5695a8c HTTP/1.1 200 + 2915 67.162.10.216 - - [22/Aug/2010:02:00:00 +0000] GET /t +racker.js.php?45aa01ed37b58d2a537b1ba12bb97fe2e5695a8c HTTP/1.1 200 + 2882 66.249.71.98 - - [22/Aug/2010:03:04:00 +0000] GET /a +d-openx.php?out=js&d=mod-top-hdr-defer&z-i=24809&z-n=top-...blah....
|
|---|