Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
Sometimes I find myself using perl to parse and frob flatfiles that end up getting loaded via bulk loaders (sqlldr, etc) into databases. These files get BIG - like one-line records of up to 3K, and up to 14 million records across a bunch of files.
What's the best way to get the best possible I/O performance out of perl? Up till now I've been doing it the obtuse way...
foreach my $file (@files) { open(FILE, $file) or die "Nya, nya: $!\n"; while(my $line = <FILE>) { # We often use | delimiters.... my @fields = split(/|/, $line); # Do something nifty with the fields... print OUTPUT join("|", @fields); } }
This is one of those situations where if I could save a miniscule amount of time per record, it could potentially shave a half an hour off of the run time of these monster processing jobs.
What's going on behind the scenes when you read a file one line at a time? Would it be better to read big buffers, (say 100K at a shot) and then go line by line from the buffer until it's exhausted? Is there a module that already does this? How can I optimize the performance of split() in this situation?
I guess this is a classic optimization question - I've got a loop, and it's going to be run millions upon millions of times. Any suggestions on how to make the loop run faster would be greatly appreciated.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Fastest I/O possible?
by broquaint (Abbot) on Aug 23, 2002 at 01:38 UTC | |
by sauoq (Abbot) on Aug 23, 2002 at 01:47 UTC | |
|
Re: Fastest I/O possible?
by Aristotle (Chancellor) on Aug 23, 2002 at 02:39 UTC | |
|
Re: Fastest I/O possible?
by dws (Chancellor) on Aug 23, 2002 at 04:19 UTC | |
|
Re: Fastest I/O possible?
by BrowserUk (Patriarch) on Aug 23, 2002 at 05:20 UTC | |
|
(tye)Re: Fastest I/O possible?
by tye (Sage) on Aug 23, 2002 at 17:53 UTC | |
|
Re: Fastest I/O possible?
by sauoq (Abbot) on Aug 23, 2002 at 01:41 UTC | |
|
Re: Fastest I/O possible?
by mordibity (Acolyte) on Aug 23, 2002 at 14:38 UTC | |
|
Re: Fastest I/O possible?
by fglock (Vicar) on Aug 23, 2002 at 14:29 UTC |