Unless your data is sorted and the file's lines (or records) are fixed in length, your solution will never be faster than O(n). However, there's a lot of room for improvement in runtimes even if magnitudes of work don't change.

One aspect to consider is how often you're expecting to see a match in the 16GB input file. If matching records are sparse, you can gain a lot by rejecting non-matches and short-circuiting the loop's iteration as early as possible. Instead of splitting the line, massaging $tab_delimited_array[3], and then running it through Unix_Date and Date_ConvTZ before finally testing to see if $date_converted is the same as $extracted_YMD, couldn't you massage your $date_converted into something that more approximates the raw format of the date presented in the 16GB file? That would allow for faster rejections of unneeded lines.

Second, if it turns out that there are frequent matches in the file, you might be wasting unnecessary time printing often. You could push $_ onto a cache array, and then print the array every 1000 iterations, for example. Then do a final flush after terminating the loop. That would be a small enough chunk as to not introduce memory problems, while at the same time reducing time spent in IO calls.


Dave


In reply to Re: Optimise the script by davido
in thread Optimise the script by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.