perl -le 'BEGIN{$,=","} print map int rand 1000, 1..2500 for 1..547_18 +3' > infile.csv [sk]% time wc -l infile.csv 547183 numbers.csv 2.730u 11.660s 1:32.06 15.6% [sk]% time perl -nle '$line++; print +($line-1) if eof;' infile.csv 547183 19.600u 4.560s 0:24.16 100.0%
Agreed 300GB is freaking large! But Perl was able to read this 5GB file very fast. I don't see a huge problem just reading a 300GB file.
It will be hard for us to identify where the program is stalling without looking at the "do stuff here" block. For example, if the file you are reading in is a CSV file and you parse it to get a HUGE list then it will slow down your process. Thinking ahead and designing the right input file for processing will solve runtime issues. For example, if you need only certain portions of each line then you might want to trim down the input file separately before you start your "core" process!
Also have you tried running this script on a smaller file?
% head -10000 inputfile > smallfile % script smallfile
See if this completes. If it does then there are some issues with large file.
Are there lines inside your while block that do not have to be proceesed for every record?
cheers
SK
PS: Just curious what kind of application requires 300GB of file? How do you manage such large files? Very thought of backup scares me :)
In reply to Re: Iterating through HUGE FILES
by sk
in thread Iterating through HUGE FILES
by jmaya
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |