Re: Re: read data 1 byte at a time

Curious. I ran your benchmark and got a few warnings about "Parentheses missing around "my" list at (eval 42) line 3.", which I tracked down to the my $char in sysread test. So I stuck in the brackets expecting the absence of the warning generation to speed that case up a tad. To my surprise it doesn't. It slows it down a tad. It's consistant, but I can't explain why.

For variety, if the file is less than a couple of hundred megs, slurping the whole thing and using substr to iterate the chars is a fair bit quicker.

#!/usr/bin/perl

#! perl -slw
use strict;

use Benchmark qw /cmpthese/;
use Fcntl     qw /:DEFAULT :seek/;

our $file = $ARGV[0];

sysopen our $fh1 => $file, O_RDONLY or die $!; binmode $fh1;
open    our $fh2 => $file           or die $!; binmode $fh2;
open    our $fh3 => $file           or die $!; binmode $fh3;
sysopen our $fh4 => $file, O_RDONLY or die $!; binmode $fh4;

our ($c1, $c2, $c3, $c4) = (0, 0, 0, 0);

cmpthese -5 => {
    sysread  => 'sysseek $fh1 => 0, SEEK_SET or die;
                 $c1 = 0;
                 $c1 ++ while sysread $fh1 => my( $char ), 1',
    readline => 'seek $fh2 => 0, SEEK_SET or die;
                 local $/ = \1;
                 $c2 = 0;
                 $c2 ++ while defined (my $char = <$fh2>)',
    getc     => 'seek $fh3 => 0, SEEK_SET or die;
                 $c3 = 0;
                 $c3 ++ while defined (my $char = getc $fh3)',
    substr   => 'seek $fh4 => 0, SEEK_SET or die;
                 my( $char, $data );
                 sysread( $fh4, $data, -s $file );
                 $c4 = length( $data );
                 $char = substr( $data, $_, 1 )  for 0 .. $c4',

};
print "$c1 : $c2 : $c3 : $c4";

die '$c1 is empty' unless $c1;
die '$c2 is empty' unless $c2;
die '$c3 is empty' unless $c3;
die '$c4 is empty' unless $c4;
die 'Unequal'      unless $c1 == $c2 && $c2 == $c3 && $c3 == $c4;
__END__
P:\test>test fruit.exe
Parentheses missing around "my" list at (eval 42) line 3.
Parentheses missing around "my" list at (eval 44) line 3.
Parentheses missing around "my" list at (eval 46) line 3.
            Rate  sysread     getc readline   substr
sysread  0.931/s       --     -61%     -71%     -75%
getc      2.41/s     159%       --     -25%     -34%
readline  3.20/s     243%      33%       --     -13%
substr    3.67/s     294%      52%      15%       --
72192 : 72192 : 72192 : 72192
P:\test>test fruit.exe
            Rate  sysread     getc readline   substr
sysread  0.876/s       --     -63%     -73%     -76%
getc      2.35/s     168%       --     -26%     -36%
readline  3.20/s     265%      36%       --     -13%
substr    3.66/s     318%      56%      15%       --
72192 : 72192 : 72192 : 72192
[download]

Examine what is said, not who speaks.

"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
Hooray!

Replies are listed 'Best First'.
Re: read data 1 byte at a time by Abigail-II (Bishop) on Oct 28, 2003 at 22:11 UTC
I ran your benchmark and got a few warnings about "Parentheses missing around "my" list at (eval 42) line 3.", which I tracked down to the my $char in sysread test. Strange, I cannot reproduce that. If I check man perldiag, it says: `Parentheses missing around "%s" list (W parenthesis) You said something like my $foo, $bar = @_; when you meant my ($foo, $bar) = @_; Remember that "my", "our", and "local" bind tighter than comma.` [download] but there's no assignment going on here. Note that the sysopen and open lines use the same construct (using our instead of my), and there it doesn't warn. Which version of Perl are you using? For variety, if the file is less than a couple of hundred megs, slurping the whole thing and using substr to iterate the chars is a fair bit quicker. Welllllll, your benchmark isn't fair. All others do two things in the loop, assign a character to $char, and increment an integer. If I change the 'substr' test to: `substr => 'seek $fh4 => 0, SEEK_SET or die; my ($char, $data); sysread $fh4 => $data, -s $file; $c4 = 0; $char = substr $data => $c4 ++, 1 for 0 .. length ($d +ata) - 1'` [download] I get the following results: `Rate sysread getc substr readline sysread 6.48/s -- -54% -60% -63% getc 14.0/s 117% -- -14% -19% substr 16.3/s 152% 16% -- -6% readline 17.4/s 169% 24% 7% -- 72192 : 72192 : 72192 : 72192` [download] Of course, the differences between substr and readline are minimal, and even getc is in the same ballpark. Abigail	[reply] [d/l] [select]
Re: Re: read data 1 byte at a time by BrowserUk (Patriarch) on Oct 28, 2003 at 22:30 UTC
Which version of Perl are you using? v5.8.0 / AS 802. Not sure why the warning is produced. By the evidence of the pod you quoted, it's a bug, but I never encountered it before. Welllllll, your benchmark isn't fair. All others do two things in the loop, assign a character to $char, and increment an integer. True! ... but I already knew how many chars there were, so I didn't need to count 'em:) Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail Hooray!	[reply]
Re: read data 1 byte at a time by Abigail-II (Bishop) on Oct 28, 2003 at 22:42 UTC
I see it with 5.8.0 as well. But not with 5.8.1, 5.8.2-RC1, or 5.9.0. So, I guess it's a bug that's fixed in the current release. Abigail	[reply]