in reply to Optimise the script
I would get the time ASAP. Split can be a time consuming critter, cut it short by using the limit parameter on split.
my @tab_delimited_array = split(/\t/,$_,5);
I agree with happy.barney. The routines with %Y,%H etc are really slow compared with the low level functions. strftime() is famous for being slow. I would send $tab_delimited_array[3] directly into code like the epoch routine below.
timegm() is implemented very efficiently and it caches months that it has seen before - there are a few math operations and bingo you have epoch time number. Even if it happens to do some multiplies, no big deal as on a modern Intel processor they are about the same speed as integer ops! There is a "no error checking" version of timegm that you can import although I don't think that you will need to.
Calculate your "search for range" in advance, convert to epoch integers and then a couple of integer compares gets you a yes/no decision quickly.
To make things really fast, you will have to do some benchmarking. Run it without doing anything except reading the file, add in the split and see what that does, add in time conversion and see what that does. Consider and try using a regex to extract the date, sometimes that is faster than using split - but testing is required. Doing something like a binary search to get you near the start of your "search range" has the potential to really speed things up, but a huge increase in complexity (assuming this is an ordered file).
Update: I don't know who controls the time format - often we don't get a choice, but if you do, then something like YYYY-MM-DD HH:MM:SS where leading zeroes are important is a good idea. "2011-03-26 14:01:35" can be directly compared against a similar string with lt, gt, eq (or ASCII sorted) and the order will "work out" without conversions. This format also translates very directly into many database time formats. Keep time in UTC(GMT) for all logging functions and translate into local time as needed for presentation.#!/usr/bin/perl -w use strict; use Time::Local; #use Time::Local 'timegm_nocheck'; #faster non error checked version my %montext2num = ('Jan' => 0, 'Feb'=> 1, 'Mar'=> 2, 'Apr'=> 3, 'May'=> 4, 'Jun'=> 5, 'Jul'=> 6, 'Aug'=> 7, 'Sep'=> 8, 'Oct'=> 9, 'Nov'=> 10, 'Dec'=> 11); my $x = epoch('[26/Mar/2011:06:00:00 ]'); print "epoch=$x\n"; sub epoch { my $log_time = shift; # like [26/Mar/2011:06:00:00.....blah] my ($day,$mon,$year,$hour,$min,$sec) = $log_time =~ m|(\d+)/(\w+)/(\d+):(\d\d):(\d\d):(\d\d)|; my $month = $montext2num{$mon}; return (timegm($sec, $min, $hour, $day, $month, $year)); }
|
|---|