Sometimes I find myself using perl to parse and frob flatfiles that end up getting loaded via bulk loaders (sqlldr, etc) into databases. These files get BIG - like one-line records of up to 3K, and up to 14 million records across a bunch of files.
What's the best way to get the best possible I/O performance out of perl? Up till now I've been doing it the obtuse way...
foreach my $file (@files) { open(FILE, $file) or die "Nya, nya: $!\n"; while(my $line = <FILE>) { # We often use | delimiters.... my @fields = split(/|/, $line); # Do something nifty with the fields... print OUTPUT join("|", @fields); } }
This is one of those situations where if I could save a miniscule amount of time per record, it could potentially shave a half an hour off of the run time of these monster processing jobs.
What's going on behind the scenes when you read a file one line at a time? Would it be better to read big buffers, (say 100K at a shot) and then go line by line from the buffer until it's exhausted? Is there a module that already does this? How can I optimize the performance of split() in this situation?
I guess this is a classic optimization question - I've got a loop, and it's going to be run millions upon millions of times. Any suggestions on how to make the loop run faster would be greatly appreciated.
In reply to Fastest I/O possible? by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |