Re: Large file data extraction

Hold on. Do you really slurp a 4.5 GB file into memory? On anything with less than about 20 GB of memory that will cause thrashing like you wouldn't believe (err, ok, maybe you would - you've seen it)!

It looks like you are parsing HTML so you should take a hard look at modules like HTML::Parser to do a lot of the heavy lifting for you.

If you are not dealing with HTML, then at least nest the while loop in an outer while loop that reads a record at a time rather than slurping the whole file.

Perl reduces RSI - it saves typing

Comment on Re: Large file data extraction

Replies are listed 'Best First'.
Re^2: Large file data extraction by tod222 (Pilgrim) on Aug 12, 2008 at 01:43 UTC
To elaborate on GrandFather's comment: 4.5GB is a huge amount of data (the CPU appears to be processing a huge amount of data). Whether or not you take GrandFather's suggestion to use HTML::Parser, you definitely want to take his suggestion of replacing the slurp with a record-at-a-time read. A while loop reading a record at a time will allow for useful print statements for debugging or progress reporting.	[reply]

Replies are listed 'Best First'.

Re^2: Large file data extraction
by tod222 (Pilgrim) on Aug 12, 2008 at 01:43 UTC

4.5GB is a huge amount of data (the CPU appears to be processing a huge amount of data).

Whether or not you take GrandFather's suggestion to use HTML::Parser, you definitely want to take his suggestion of replacing the slurp with a record-at-a-time read.

A while loop reading a record at a time will allow for useful print statements for debugging or progress reporting.

[reply]