in reply to read data 1 byte at a time

Three ways were suggested: using sysread, setting $/ and using getc. So, one may wonder which one is the fastest. Here is a Benchmark. Note that one should realize the presented results are for my system, with my Perl only. Running this with other Perls, or on other systems may give different results. Always run the Benchmark on your own system before drawing any conclusions, especially when it involves I/O.
#!/usr/bin/perl use strict; use warnings; use Benchmark qw /cmpthese/; use Fcntl qw /:DEFAULT :seek/; my $file = "/etc/passwd"; sysopen our $fh1 => $file, O_RDONLY or die $!; open our $fh2 => $file or die $!; open our $fh3 => $file or die $!; our ($c1, $c2, $c3) = (0, 0, 0); cmpthese -10 => { sysread => 'sysseek $fh1 => 0, SEEK_SET or die; $c1 = 0; $c1 ++ while sysread $fh1 => my $char, 1', readline => 'seek $fh2 => 0, SEEK_SET or die; local $/ = \1; $c2 = 0; $c2 ++ while defined (my $char = <$fh2>)', getc => 'seek $fh3 => 0, SEEK_SET or die; $c3 = 0; $c3 ++ while defined (my $char = getc $fh3)', }; die '$c1 is empty' unless $c1; die '$c2 is empty' unless $c2; die '$c3 is empty' unless $c3; die 'Unequal' unless $c1 == $c2 && $c2 == $c3; __END__ Rate sysread getc readline sysread 642/s -- -49% -52% getc 1249/s 94% -- -7% readline 1338/s 108% 7% --

Replies are listed 'Best First'.
Re: Re: read data 1 byte at a time
by BrowserUk (Patriarch) on Oct 28, 2003 at 20:43 UTC

    Curious. I ran your benchmark and got a few warnings about "Parentheses missing around "my" list at (eval 42) line 3.", which I tracked down to the my $char in sysread test. So I stuck in the brackets expecting the absence of the warning generation to speed that case up a tad. To my surprise it doesn't. It slows it down a tad. It's consistant, but I can't explain why.

    For variety, if the file is less than a couple of hundred megs, slurping the whole thing and using substr to iterate the chars is a fair bit quicker.

    #!/usr/bin/perl #! perl -slw use strict; use Benchmark qw /cmpthese/; use Fcntl qw /:DEFAULT :seek/; our $file = $ARGV[0]; sysopen our $fh1 => $file, O_RDONLY or die $!; binmode $fh1; open our $fh2 => $file or die $!; binmode $fh2; open our $fh3 => $file or die $!; binmode $fh3; sysopen our $fh4 => $file, O_RDONLY or die $!; binmode $fh4; our ($c1, $c2, $c3, $c4) = (0, 0, 0, 0); cmpthese -5 => { sysread => 'sysseek $fh1 => 0, SEEK_SET or die; $c1 = 0; $c1 ++ while sysread $fh1 => my( $char ), 1', readline => 'seek $fh2 => 0, SEEK_SET or die; local $/ = \1; $c2 = 0; $c2 ++ while defined (my $char = <$fh2>)', getc => 'seek $fh3 => 0, SEEK_SET or die; $c3 = 0; $c3 ++ while defined (my $char = getc $fh3)', substr => 'seek $fh4 => 0, SEEK_SET or die; my( $char, $data ); sysread( $fh4, $data, -s $file ); $c4 = length( $data ); $char = substr( $data, $_, 1 ) for 0 .. $c4', }; print "$c1 : $c2 : $c3 : $c4"; die '$c1 is empty' unless $c1; die '$c2 is empty' unless $c2; die '$c3 is empty' unless $c3; die '$c4 is empty' unless $c4; die 'Unequal' unless $c1 == $c2 && $c2 == $c3 && $c3 == $c4; __END__ P:\test>test fruit.exe Parentheses missing around "my" list at (eval 42) line 3. Parentheses missing around "my" list at (eval 44) line 3. Parentheses missing around "my" list at (eval 46) line 3. Rate sysread getc readline substr sysread 0.931/s -- -61% -71% -75% getc 2.41/s 159% -- -25% -34% readline 3.20/s 243% 33% -- -13% substr 3.67/s 294% 52% 15% -- 72192 : 72192 : 72192 : 72192 P:\test>test fruit.exe Rate sysread getc readline substr sysread 0.876/s -- -63% -73% -76% getc 2.35/s 168% -- -26% -36% readline 3.20/s 265% 36% -- -13% substr 3.66/s 318% 56% 15% -- 72192 : 72192 : 72192 : 72192

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    Hooray!

      I ran your benchmark and got a few warnings about "Parentheses missing around "my" list at (eval 42) line 3.", which I tracked down to the my $char in sysread test.
      Strange, I cannot reproduce that. If I check man perldiag, it says:
      Parentheses missing around "%s" list (W parenthesis) You said something like my $foo, $bar = @_; when you meant my ($foo, $bar) = @_; Remember that "my", "our", and "local" bind tighter than comma.
      but there's no assignment going on here. Note that the sysopen and open lines use the same construct (using our instead of my), and there it doesn't warn. Which version of Perl are you using?
      For variety, if the file is less than a couple of hundred megs, slurping the whole thing and using substr to iterate the chars is a fair bit quicker.
      Welllllll, your benchmark isn't fair. All others do two things in the loop, assign a character to $char, and increment an integer. If I change the 'substr' test to:
      substr => 'seek $fh4 => 0, SEEK_SET or die; my ($char, $data); sysread $fh4 => $data, -s $file; $c4 = 0; $char = substr $data => $c4 ++, 1 for 0 .. length ($d +ata) - 1'
      I get the following results:
      Rate sysread getc substr readline sysread 6.48/s -- -54% -60% -63% getc 14.0/s 117% -- -14% -19% substr 16.3/s 152% 16% -- -6% readline 17.4/s 169% 24% 7% -- 72192 : 72192 : 72192 : 72192
      Of course, the differences between substr and readline are minimal, and even getc is in the same ballpark.

      Abigail

        Which version of Perl are you using?

        v5.8.0 / AS 802. Not sure why the warning is produced. By the evidence of the pod you quoted, it's a bug, but I never encountered it before.

        Welllllll, your benchmark isn't fair. All others do two things in the loop, assign a character to $char, and increment an integer.

        True! ... but I already knew how many chars there were, so I didn't need to count 'em:)


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
        Hooray!