I suspect that your sort of the "date/time" field didn't work as well as you think! That is because the format in the file won't sort in ascending date order when using a plain alpha-numeric sort! I mean "Aug" will sort less than "Jan" although we know that's not right!

When comparing date/times you need to convert to something that can be compared. There are two basic options:
1. convert to epoch time (a huge binary number) and you use numeric compares on that number
2. convert to a text representation that allows you to use string compares.

Below I show method(2) because if you have any influence upon the format of this log file, this is a HUGE hint on what would be better!

A string like "2010-08-22 01:00:00" can be compared with string le,gt,cmp functions to "2010-08-21 06:00:00" without calling any Perl module or function. And it is "human readable" as opposed to an integer epoch time. Please note that leading zeroes are important in this type of format!

Below I just showed one way to do a format conversion like this. I didn't spend a billion hours making this as efficient as possible. Just trying to demonstrate the idea.

I think your "sort" was just wasted CPU MIP's. Write code that processes the file, use the reformat_date_time() subroutine to get the reformatted date/time for that line and look for lines that are gt or eq "2010-08-21 06:00:00" and lt or eq "2010-08-22 09:00:00" using string compare functions.

If was doing some huge sort, I would be tempted and probably would convert times to epoch values to speed up the compares in the sort. But here you are going to "touch" each input line exactly once to convert the date/time info into a better string, and then either save that line or not.

#!/usr/bin/perl -w use strict; my %month2numstring = (Jan => '01', Feb => '02', Mar => '03', Apr => '04', May => '05', Jun => '06', Jul => '06', Aug => '08', Sep => '09', Oct => '10', Nov => '11', Dec => '12', ); while (<DATA>) { my $datefield = (split)[3]; my ($datestring) = $datefield=~ m|\[([\w/:]+)|; my $new_date_field = reformat_date_time($datestring); print "$new_date_field\n"; } sub reformat_date_time { my $date_time = shift; my ($date,$time) = m|([\w/]+):([\d:]+)|; my ($day,$month_text,$year) = split(m|/|,$date); $day = "0$1" if $day =~ m|^(\d)$|; #force leading zero my $month = $month2numstring{$month_text}; return ($year.'-'.$month.'-'.$day." $time"); } =prints 2010-08-21 00:00:00 2010-08-21 00:01:00 2010-08-22 01:00:00 2010-08-22 02:00:00 2010-08-22 03:04:00 =cut __DATA__ 67.162.10.216 - - [21/Aug/2010:00:00:00 +0000] GET /2 +010-08-18/news/ct-met-barrington-student-death-20100818_1_mental-illn +ess-suicide-prevention-teen-suicides HTTP/1.1 200 6826 67.162.10.216 - - [21/Aug/2010:00:01:00 +0000] GET /2 +010-08-18/news/ct-met-barrington-student-death-20100818_1_mental-illn +ess-suicide-prevention-teen-suicides HTTP/1.1 200 6826 67.162.10.216 - - [22/Aug/2010:01:00:00 +0000] GET /t +racker.js.php?45aa01ed37b58d2a537b1ba12bb97fe2e5695a8c HTTP/1.1 200 + 2915 67.162.10.216 - - [22/Aug/2010:02:00:00 +0000] GET /t +racker.js.php?45aa01ed37b58d2a537b1ba12bb97fe2e5695a8c HTTP/1.1 200 + 2882 66.249.71.98 - - [22/Aug/2010:03:04:00 +0000] GET /a +d-openx.php?out=js&d=mod-top-hdr-defer&z-i=24809&z-n=top-...blah....

In reply to Re: Extract the lines by Marshall
in thread Extract the lines by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.