in reply to Large file data extraction

Hold on. Do you really slurp a 4.5 GB file into memory? On anything with less than about 20 GB of memory that will cause thrashing like you wouldn't believe (err, ok, maybe you would - you've seen it)!

It looks like you are parsing HTML so you should take a hard look at modules like HTML::Parser to do a lot of the heavy lifting for you.

If you are not dealing with HTML, then at least nest the while loop in an outer while loop that reads a record at a time rather than slurping the whole file.


Perl reduces RSI - it saves typing

Replies are listed 'Best First'.
Re^2: Large file data extraction
by tod222 (Pilgrim) on Aug 12, 2008 at 01:43 UTC
    To elaborate on GrandFather's comment:

    4.5GB is a huge amount of data (the CPU appears to be processing a huge amount of data).

    Whether or not you take GrandFather's suggestion to use HTML::Parser, you definitely want to take his suggestion of replacing the slurp with a record-at-a-time read.

    A while loop reading a record at a time will allow for useful print statements for debugging or progress reporting.