in reply to Re: read data 1 byte at a time
in thread read data 1 byte at a time

Curious. I ran your benchmark and got a few warnings about "Parentheses missing around "my" list at (eval 42) line 3.", which I tracked down to the my $char in sysread test. So I stuck in the brackets expecting the absence of the warning generation to speed that case up a tad. To my surprise it doesn't. It slows it down a tad. It's consistant, but I can't explain why.

For variety, if the file is less than a couple of hundred megs, slurping the whole thing and using substr to iterate the chars is a fair bit quicker.

#!/usr/bin/perl #! perl -slw use strict; use Benchmark qw /cmpthese/; use Fcntl qw /:DEFAULT :seek/; our $file = $ARGV[0]; sysopen our $fh1 => $file, O_RDONLY or die $!; binmode $fh1; open our $fh2 => $file or die $!; binmode $fh2; open our $fh3 => $file or die $!; binmode $fh3; sysopen our $fh4 => $file, O_RDONLY or die $!; binmode $fh4; our ($c1, $c2, $c3, $c4) = (0, 0, 0, 0); cmpthese -5 => { sysread => 'sysseek $fh1 => 0, SEEK_SET or die; $c1 = 0; $c1 ++ while sysread $fh1 => my( $char ), 1', readline => 'seek $fh2 => 0, SEEK_SET or die; local $/ = \1; $c2 = 0; $c2 ++ while defined (my $char = <$fh2>)', getc => 'seek $fh3 => 0, SEEK_SET or die; $c3 = 0; $c3 ++ while defined (my $char = getc $fh3)', substr => 'seek $fh4 => 0, SEEK_SET or die; my( $char, $data ); sysread( $fh4, $data, -s $file ); $c4 = length( $data ); $char = substr( $data, $_, 1 ) for 0 .. $c4', }; print "$c1 : $c2 : $c3 : $c4"; die '$c1 is empty' unless $c1; die '$c2 is empty' unless $c2; die '$c3 is empty' unless $c3; die '$c4 is empty' unless $c4; die 'Unequal' unless $c1 == $c2 && $c2 == $c3 && $c3 == $c4; __END__ P:\test>test fruit.exe Parentheses missing around "my" list at (eval 42) line 3. Parentheses missing around "my" list at (eval 44) line 3. Parentheses missing around "my" list at (eval 46) line 3. Rate sysread getc readline substr sysread 0.931/s -- -61% -71% -75% getc 2.41/s 159% -- -25% -34% readline 3.20/s 243% 33% -- -13% substr 3.67/s 294% 52% 15% -- 72192 : 72192 : 72192 : 72192 P:\test>test fruit.exe Rate sysread getc readline substr sysread 0.876/s -- -63% -73% -76% getc 2.35/s 168% -- -26% -36% readline 3.20/s 265% 36% -- -13% substr 3.66/s 318% 56% 15% -- 72192 : 72192 : 72192 : 72192

Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
Hooray!

Replies are listed 'Best First'.
Re: read data 1 byte at a time
by Abigail-II (Bishop) on Oct 28, 2003 at 22:11 UTC
    I ran your benchmark and got a few warnings about "Parentheses missing around "my" list at (eval 42) line 3.", which I tracked down to the my $char in sysread test.
    Strange, I cannot reproduce that. If I check man perldiag, it says:
    Parentheses missing around "%s" list (W parenthesis) You said something like my $foo, $bar = @_; when you meant my ($foo, $bar) = @_; Remember that "my", "our", and "local" bind tighter than comma.
    but there's no assignment going on here. Note that the sysopen and open lines use the same construct (using our instead of my), and there it doesn't warn. Which version of Perl are you using?
    For variety, if the file is less than a couple of hundred megs, slurping the whole thing and using substr to iterate the chars is a fair bit quicker.
    Welllllll, your benchmark isn't fair. All others do two things in the loop, assign a character to $char, and increment an integer. If I change the 'substr' test to:
    substr => 'seek $fh4 => 0, SEEK_SET or die; my ($char, $data); sysread $fh4 => $data, -s $file; $c4 = 0; $char = substr $data => $c4 ++, 1 for 0 .. length ($d +ata) - 1'
    I get the following results:
    Rate sysread getc substr readline sysread 6.48/s -- -54% -60% -63% getc 14.0/s 117% -- -14% -19% substr 16.3/s 152% 16% -- -6% readline 17.4/s 169% 24% 7% -- 72192 : 72192 : 72192 : 72192
    Of course, the differences between substr and readline are minimal, and even getc is in the same ballpark.

    Abigail

      Which version of Perl are you using?

      v5.8.0 / AS 802. Not sure why the warning is produced. By the evidence of the pod you quoted, it's a bug, but I never encountered it before.

      Welllllll, your benchmark isn't fair. All others do two things in the loop, assign a character to $char, and increment an integer.

      True! ... but I already knew how many chars there were, so I didn't need to count 'em:)


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      Hooray!

        I see it with 5.8.0 as well. But not with 5.8.1, 5.8.2-RC1, or 5.9.0. So, I guess it's a bug that's fixed in the current release.

        Abigail