comment on

I created a dummy ~5GB file.

perl -le 'BEGIN{$,=","} print map int rand 1000, 1..2500 for 1..547_18
+3' > infile.csv

[sk]% time wc -l infile.csv
 547183 numbers.csv
2.730u 11.660s 1:32.06 15.6%    

[sk]% time perl -nle '$line++; print +($line-1) if eof;' infile.csv
547183
19.600u 4.560s 0:24.16 100.0%
[download]

Agreed 300GB is freaking large! But Perl was able to read this 5GB file very fast. I don't see a huge problem just reading a 300GB file.

It will be hard for us to identify where the program is stalling without looking at the "do stuff here" block. For example, if the file you are reading in is a CSV file and you parse it to get a HUGE list then it will slow down your process. Thinking ahead and designing the right input file for processing will solve runtime issues. For example, if you need only certain portions of each line then you might want to trim down the input file separately before you start your "core" process!

Also have you tried running this script on a smaller file?

% head -10000 inputfile > smallfile

% script smallfile
[download]

See if this completes. If it does then there are some issues with large file.

Are there lines inside your while block that do not have to be proceesed for every record?

cheers

PS: Just curious what kind of application requires 300GB of file? How do you manage such large files? Very thought of backup scares me :)

In reply to Re: Iterating through HUGE FILES by sk
in thread Iterating through HUGE FILES by jmaya

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.