TROGDOR has asked for the wisdom of the Perl Monks concerning the following question:
open (FILE, "$path") or die "ERROR: Could not open $path.\n"; while (1) { $eof = read (FILE, $header, 4); ($size, $code, $ftype) = unpack ("nCC", $header) ; if ($size == 0) { print "Size is zero. Exiting.\n"; last; } $size = $size - 4; if ($size > 0) { $eof = read (FILE, $data, $size); } } close FILE;
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Perl's poor disk IO performance
by moritz (Cardinal) on Apr 29, 2010 at 19:29 UTC | |
That said, there is a way to improve IO speed. Perl's normal open, read and readline functions use IO layers, which you can circumvent by using sysopen and sysread.
Perl 6 - links to (nearly) everything that is Perl 6.
| [reply] |
Re: Perl's poor disk IO performance
by BrowserUk (Patriarch) on Apr 29, 2010 at 19:57 UTC | |
Change open (FILE, "$path") to open (FILE, '<:raw', "$path"). On my system, with that change, your code reading 10MB, takes .38 seconds. | [reply] [d/l] [select] |
Re: Perl's poor disk IO performance
by roboticus (Chancellor) on Apr 29, 2010 at 21:33 UTC | |
A few minor notes: Just as a reference, I do (in my day job) quite a lot of file processing in both C/C++ and perl. I often find that the perl stuff runs slower, but not enough so that I want to write all my processing programs in C/C++. I find it so much simpler to do regex and complex data munging in perl, that when I have to whack a file using a good bit of intelligence, I tend to use perl first. If I need to whack a large file with just a little simple code and speed is of the essence, I tend to use C/C++. Only rarely do I find that I have to optimize a perl program or rewrite the program to use C/C++. As usual, YMMV. ...roboticus | [reply] |
Re: Perl's poor disk IO performance
by Marshall (Canon) on Apr 29, 2010 at 23:43 UTC | |
Would be curious if: open($fh,"<:unix",$path) produces further speed improvements past :raw? You didn't post the C code, so I'm not 100% sure that we have an "apples to apples" comparison here - there may be some detail that makes this not quite the same. BTW, are you on a Unix or a Windows platform? I don't think that matters, but it might in some weird way that I don't understand right now. I've written binary manipulation stuff in Perl before for doing things like concatenating .wav files together. I wouldn't normally be thinking of Perl for a massive amount of binary number crunching, but it can do it! Most of my code involves working with ASCII and huge amounts of time can get spent in the splitting, match global regex code.. I have one app where 30% of the time is spent doing just that. The raw reading/writing to the disk is usually not an issue in my code as there are other considerations that take a lot of time. Update: see the fine benchmarks from BrowserUk. I appears that :perlio & setting binmode($fh) is the way to go. | [reply] |
by BrowserUk (Patriarch) on Apr 30, 2010 at 00:20 UTC | |
That's interesting. As is often the case with Perl, things move (silently) on as new versions appear. I just re-ran a series of tests that I last performed shortly after IO layers were added. Back then, on my system ':raw' was exactly equivalent to using binmode. It no longer is, nor is either the fastest option. Using this:
You can see (and interpret) the results for yourself:
On my system, I'll be using :perlio & binmode for fast binary access from now on. (Until it changes again:) Perhaps even more indicative of the lag in the documentation is this:
If :raw popped all layers that were incompatible with binary reading, then :crlf:raw should be as fast as :crlf + binmode. But it ain't! Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
by Marshall (Canon) on Apr 30, 2010 at 19:27 UTC | |
| [reply] |
Re: Perl's poor disk IO performance
by snoopy (Curate) on Apr 29, 2010 at 23:09 UTC | |
Memory mapping can be a better choice, if I/O has been identified as a bottleneck and you want 'semi-random' access to you data. I.e. if you can be a bit selective, skipping records, based on the headers and thus skipping significant blocks of data. For example, the following uses Sys::Mmap: If you've identified I/O as a bottleneck, it's worthwhile benchmarking this against your above solution anyway, even if you are reading sequentially. It'll help to determine if read really is imposing a performance penalty! | [reply] [d/l] |
Re: Perl's poor disk IO performance
by Anonymous Monk on Dec 31, 2010 at 00:20 UTC | |
The short of it is that I get 50 MB/sec when processing files line/by/line in Perl. Perl file performance is near and dear to my heart, since I routinely work on multi-gigabyte files. I wrote a benchmark program a little while ago to help me to stay with Perl, because performance was tempting me to go to C++ or to bypass Perl's buffering and do it myself (with large sysread calls). I ran this on a file that was exactly 100 MB long, with lots of small lines, so a somewhat worst-case for a naive line-at-a-time approach. This is a UTF-8 file, and I was particularly interested to figure out why my unicode-file reading was so pitifully slow on a Windows machine. So my fix was to start specifying ":raw:perlio:utf8" on my file handles, and I got a 6x improvement in speed.
Here's the code. Yes, pretty crude, but it was enough to tell me what I was doing wrong - PerlIO is the win. The ridiculously large numbers are because the file gets into the Win32 file cache and stays there. That's actually a plus for my benchmark because it shows me where my bottlenecks are. The large sysread numbers are because no postprocessing is being done, e.g. breaking the file up into lines. Since 55 MB/sec is enough for me at the moment, I'm not looking at writing my own buffering/line processing code just yet. But it also shows that perlio is imposing a tax compared to pure sysread. So maybe someday I'll look at the PerlIO code and see if there's some useful optimizations that won't pessimize something else.
| [reply] [d/l] [select] |
Re: Perl's poor disk IO performance
by kikuchiyo (Hermit) on Apr 30, 2010 at 14:26 UTC | |
In my experience this can speed up processing. Unless, of course, your .gds files are so big that they don't fit in memory. | [reply] |