Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

b 05/Jul/2010:07:00:10 a 05/Jul/2010:06:00:09 b 05/Jul/2010:07:00:10 c 05/Jul/2010:07:10:16 d 05/Jul/2010:08:00:10 e 05/Jul/2010:09:00:10 f 05/Jul/2010:10:00:10 h 05/Jul/2010:11:00:10 i 05/Jul/2010:12:00:20 j 05/Jul/2010:13:00:10 k 05/Jul/2010:14:00:10 l 05/Jul/2010:15:00:30 m 05/Jul/2010:16:00:10 n 05/Jul/2010:17:00:10 o 05/Jul/2010:18:00:10 p 05/Jul/2010:19:00:40 q 05/Jul/2010:20:00:10 a 05/Jul/2010:21:00:10 b 05/Jul/2010:22:00:50 v 05/Jul/2010:23:00:20 g 06/Jul/2010:01:00:10 k 06/Jul/2010:02:00:10 i 06/Jul/2010:03:00:14 j 06/Jul/2010:04:00:10 k 06/Jul/2010:05:00:18 l 06/Jul/2010:06:00:10 m 06/Jul/2010:07:00:10 n 06/Jul/2010:08:00:19 n 06/Jul/2010:09:00:10
Input file is 19739530 lines. How to extract the date that is of 3 days old dated from 05/Jul/2010:06 to next day 06/Jul/2010:09 from the input file to another file. Please help me. How to make the execution faster.

Replies are listed 'Best First'.
Re: Extract the lines from file
by ikegami (Patriarch) on Jul 09, 2010 at 06:15 UTC

    How to extract the date that is of 3 days old

    Dates don't age, so that makes no sense. My only guess as to what that means is "the date of 3 days ago", but that makes no sense in context. Please explain more clearly. Perhaps you could show what output expect from the data you showed?

    Update: I think I got it. You want the lines from the 24 hours that end 3*24 hours ago.

    Personally, I'd use a date-time format that lexically sorts in chronological order (e.g. 2010/07/06:07:00:10). Then you could precalculate the start and end timestamps, and use string comparisons. It's much faster to compare to strings than to parse a date plus compare two dates.

    You're probably searching through the file linearly, which could require reading a lot of lines in which you have no interest, especially if the lines you want are near the end of the file. Since the entries in the files are sorted, you could do a binary search of the file with the help of seek. Then you'd only need to read 25 lines (log2(19739530)) to find the lines in which you are interested.

      #!/usr/bin/perl use Date::Manip; $ndays=$ARGV[0]; # Get the logs $date = ParseDate("$ndays days ago"); print UnixDate($date,"%m/%d/%Y") . "\n";
      In the input file, the values are of 3 day and 2 day old date. I have to extract the time from Suppose date is : Thu Jul 8 2010 This is the inout file,
      - [5/Jul/2010:00:59:59 +0000] - [5/Jul/2010:00:10:00 +0000] - [5/Jul/2010:06:10:00 +0000] - [5/Jul/2010:06:50:00 +0000] - [5/Jul/2010:07:10:00 +0000] - [5/Jul/2010:10:10:00 +0000] - [6/Jul/2010:06:10:00 +0000] - [6/Jul/2010:07:10:00 +0000] - [5/Jul/2010:08:10:00 +0000] - [5/Jul/2010:06:10:00 +0000] - [5/Jul/2010:09:10:00 +0000] - [5/Jul/2010:10:00:00 +0000] - [5/Jul/2010:10:15:00 +0000]
      Extract the rows, where time is greater than 5/Jul/2010:06:00:00 and less than 6/Jul/2010:10:00:00 So the expected output is :
      - [5/Jul/2010:06:10:00 +0000] - [5/Jul/2010:06:50:00 +0000] - [5/Jul/2010:07:10:00 +0000] - [5/Jul/2010:10:10:00 +0000] - [6/Jul/2010:06:10:00 +0000] - [6/Jul/2010:07:10:00 +0000] - [5/Jul/2010:08:10:00 +0000] - [5/Jul/2010:06:10:00 +0000] - [5/Jul/2010:09:10:00 +0000] - [5/Jul/2010:10:00:00 +0000]

        Still clear as mud anonymous monk. Since you have already found Date::Manip you should mostly there. Is this what you are after?

        #!/usr/bin/perl use Date::Manip; use Getopt::Long; my ($from_text, $till_text); GetOptions( from => \$from, till => \$till ); my $from_date = ParseDate($from_text); unless ($from_date) { die "Invalid From date\n"; } my $till_date = ParseDate($till_text); unless ($till_date) { die "Invalid till date\n"; } unless (Date_Cmp($from_date,$till_date) > 0) { die "Horribly\n"; } while(<>) { ... read the date ... if (Date_Cmp($date,$from_date) >= 0 and Date_Cmp($date,$till_date) < 0) { ... here we go ... } }
Re: Extract the lines from file
by Anonymous Monk on Jul 09, 2010 at 05:50 UTC