The code has a lot of futzing around with splits, substitutions and slow time conversion routines.

I would get the time ASAP. Split can be a time consuming critter, cut it short by using the limit parameter on split.
 my @tab_delimited_array = split(/\t/,$_,5); I agree with happy.barney. The routines with %Y,%H etc are really slow compared with the low level functions. strftime() is famous for being slow. I would send $tab_delimited_array[3] directly into code like the epoch routine below.

timegm() is implemented very efficiently and it caches months that it has seen before - there are a few math operations and bingo you have epoch time number. Even if it happens to do some multiplies, no big deal as on a modern Intel processor they are about the same speed as integer ops! There is a "no error checking" version of timegm that you can import although I don't think that you will need to.

Calculate your "search for range" in advance, convert to epoch integers and then a couple of integer compares gets you a yes/no decision quickly.

To make things really fast, you will have to do some benchmarking. Run it without doing anything except reading the file, add in the split and see what that does, add in time conversion and see what that does. Consider and try using a regex to extract the date, sometimes that is faster than using split - but testing is required. Doing something like a binary search to get you near the start of your "search range" has the potential to really speed things up, but a huge increase in complexity (assuming this is an ordered file).

#!/usr/bin/perl -w use strict; use Time::Local; #use Time::Local 'timegm_nocheck'; #faster non error checked version my %montext2num = ('Jan' => 0, 'Feb'=> 1, 'Mar'=> 2, 'Apr'=> 3, 'May'=> 4, 'Jun'=> 5, 'Jul'=> 6, 'Aug'=> 7, 'Sep'=> 8, 'Oct'=> 9, 'Nov'=> 10, 'Dec'=> 11); my $x = epoch('[26/Mar/2011:06:00:00 ]'); print "epoch=$x\n"; sub epoch { my $log_time = shift; # like [26/Mar/2011:06:00:00.....blah] my ($day,$mon,$year,$hour,$min,$sec) = $log_time =~ m|(\d+)/(\w+)/(\d+):(\d\d):(\d\d):(\d\d)|; my $month = $montext2num{$mon}; return (timegm($sec, $min, $hour, $day, $month, $year)); }
Update: I don't know who controls the time format - often we don't get a choice, but if you do, then something like YYYY-MM-DD HH:MM:SS where leading zeroes are important is a good idea. "2011-03-26 14:01:35" can be directly compared against a similar string with lt, gt, eq (or ASCII sorted) and the order will "work out" without conversions. This format also translates very directly into many database time formats. Keep time in UTC(GMT) for all logging functions and translate into local time as needed for presentation.

In reply to Re: Optimise the script by Marshall
in thread Optimise the script by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.