Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: Perl's poor disk IO performance

by Marshall (Canon)
on Apr 29, 2010 at 23:43 UTC ( [id://837665]=note: print w/replies, xml ) Need Help??


in reply to Perl's poor disk IO performance

I was looking at: http://perldoc.perl.org/PerlIO.html. From my reading, :raw is the same as setting binmode(FILE). In the past I've just opened the file and then set binmode with another statement. Both ways apparently result in a buffered stream. The perldoc above suggests, open($fh,"<:unix",$path) as a way to get an unbuffered stream and also has some other interesting info tidbits.

Would be curious if: open($fh,"<:unix",$path) produces further speed improvements past :raw? You didn't post the C code, so I'm not 100% sure that we have an "apples to apples" comparison here - there may be some detail that makes this not quite the same. BTW, are you on a Unix or a Windows platform? I don't think that matters, but it might in some weird way that I don't understand right now.

I've written binary manipulation stuff in Perl before for doing things like concatenating .wav files together. I wouldn't normally be thinking of Perl for a massive amount of binary number crunching, but it can do it! Most of my code involves working with ASCII and huge amounts of time can get spent in the splitting, match global regex code.. I have one app where 30% of the time is spent doing just that. The raw reading/writing to the disk is usually not an issue in my code as there are other considerations that take a lot of time.

Update: see the fine benchmarks from BrowserUk. I appears that :perlio & setting binmode($fh) is the way to go.

Replies are listed 'Best First'.
Re^2: Perl's poor disk IO performance
by BrowserUk (Patriarch) on Apr 30, 2010 at 00:20 UTC

    That's interesting. As is often the case with Perl, things move (silently) on as new versions appear. I just re-ran a series of tests that I last performed shortly after IO layers were added.

    Back then, on my system ':raw' was exactly equivalent to using binmode. It no longer is, nor is either the fastest option.

    Using this:

    #! perl -sl use Time::HiRes qw[ time ]; our $LAYER //= ':raw'; our $B; $s = time; open (FILE, "<$LAYER", "junk.bin") or die "ERROR: Could not open $path.\n"; binmode FILE if $B; $n=0; while (1) { $eof = read (FILE, $header, 4); ($size, $code, $ftype) = unpack ("nCC", $header) ; # print join(':',$size, $code, $ftype, "\n"); if ($size == 0) { print "Size is zero. Exiting"; last; } $size = $size - 4; if ($size > 0) { $eof = read (FILE, $data, $size); } $n += 4 + $size; } close FILE; print $n; printf "Took %.3f\n", time() - $s;

    You can see (and interpret) the results for yourself:

    On my system, I'll be using :perlio & binmode for fast binary access from now on. (Until it changes again:)

    Perhaps even more indicative of the lag in the documentation is this:

    C:\test>junk41 -LAYER=:crlf Size is zero. Exiting 50466132 Took 0.668 C:\test>junk41 -LAYER=:crlf -B Size is zero. Exiting 50466132 Took 0.283 C:\test>junk41 -LAYER=:crlf:raw Size is zero. Exiting 50466132 Took 0.815 C:\test>junk41 -LAYER=:crlf:raw -B Size is zero. Exiting 50466132 Took 0.845

    If :raw popped all layers that were incompatible with binary reading, then :crlf:raw should be as fast as :crlf + binmode. But it ain't!


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Thanks for the very informative benchmarks!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://837665]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (4)
As of 2024-04-24 18:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found