read data 1 byte at a time

sweetblood has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: read data 1 byte at a time by jmcnamara (Monsignor) on Oct 28, 2003 at 14:32 UTC
You can set `$/ = \1` to read 1 byte at a time, see `$/` in perlvar. `#!/usr/bin/perl -wl $/ = \1; open FILE, "file" or die "Error message here: $!"; while (<FILE>) { print; }` [download] -- John.	[reply] [d/l]
Re: read data 1 byte at a time by Anonymous Monk on Oct 28, 2003 at 14:34 UTC
Check out getc for reading one character at a time. If your STDIN needs to read the keyboard, see also Term::ReadKey. `while(my $char = getc){ print "$char\n"; } __END__` [download]	[reply] [d/l]
Re: read data 1 byte at a time by Abigail-II (Bishop) on Oct 28, 2003 at 17:02 UTC
Three ways were suggested: using sysread, setting $/ and using getc. So, one may wonder which one is the fastest. Here is a Benchmark. Note that one should realize the presented results are for my system, with my Perl only. Running this with other Perls, or on other systems may give different results. Always run the Benchmark on your own system before drawing any conclusions, especially when it involves I/O. #!/usr/bin/perl use strict; use warnings; use Benchmark qw /cmpthese/; use Fcntl qw /:DEFAULT :seek/; my $file = "/etc/passwd"; sysopen our $fh1 => $file, O_RDONLY or die $!; open our $fh2 => $file or die $!; open our $fh3 => $file or die $!; our ($c1, $c2, $c3) = (0, 0, 0); cmpthese -10 => { sysread => 'sysseek $fh1 => 0, SEEK_SET or die; $c1 = 0; $c1 ++ while sysread $fh1 => my $char, 1', readline => 'seek $fh2 => 0, SEEK_SET or die; local $/ = \1; $c2 = 0; $c2 ++ while defined (my $char = <$fh2>)', getc => 'seek $fh3 => 0, SEEK_SET or die; $c3 = 0; $c3 ++ while defined (my $char = getc $fh3)', }; die '$c1 is empty' unless $c1; die '$c2 is empty' unless $c2; die '$c3 is empty' unless $c3; die 'Unequal' unless $c1 == $c2 && $c2 == $c3; __END__ Rate sysread getc readline sysread 642/s -- -49% -52% getc 1249/s 94% -- -7% readline 1338/s 108% 7% -- [download]	[reply] [d/l]
Re: Re: read data 1 byte at a time by BrowserUk (Patriarch) on Oct 28, 2003 at 20:43 UTC
Curious. I ran your benchmark and got a few warnings about "Parentheses missing around "my" list at (eval 42) line 3.", which I tracked down to the `my $char` in sysread test. So I stuck in the brackets expecting the absence of the warning generation to speed that case up a tad. To my surprise it doesn't. It slows it down a tad. It's consistant, but I can't explain why. For variety, if the file is less than a couple of hundred megs, slurping the whole thing and using substr to iterate the chars is a fair bit quicker. #!/usr/bin/perl #! perl -slw use strict; use Benchmark qw /cmpthese/; use Fcntl qw /:DEFAULT :seek/; our $file = $ARGV[0]; sysopen our $fh1 => $file, O_RDONLY or die $!; binmode $fh1; open our $fh2 => $file or die $!; binmode $fh2; open our $fh3 => $file or die $!; binmode $fh3; sysopen our $fh4 => $file, O_RDONLY or die $!; binmode $fh4; our ($c1, $c2, $c3, $c4) = (0, 0, 0, 0); cmpthese -5 => { sysread => 'sysseek $fh1 => 0, SEEK_SET or die; $c1 = 0; $c1 ++ while sysread $fh1 => my( $char ), 1', readline => 'seek $fh2 => 0, SEEK_SET or die; local $/ = \1; $c2 = 0; $c2 ++ while defined (my $char = <$fh2>)', getc => 'seek $fh3 => 0, SEEK_SET or die; $c3 = 0; $c3 ++ while defined (my $char = getc $fh3)', substr => 'seek $fh4 => 0, SEEK_SET or die; my( $char, $data ); sysread( $fh4, $data, -s $file ); $c4 = length( $data ); $char = substr( $data, $_, 1 ) for 0 .. $c4', }; print "$c1 : $c2 : $c3 : $c4"; die '$c1 is empty' unless $c1; die '$c2 is empty' unless $c2; die '$c3 is empty' unless $c3; die '$c4 is empty' unless $c4; die 'Unequal' unless $c1 == $c2 && $c2 == $c3 && $c3 == $c4; __END__ P:\test>test fruit.exe Parentheses missing around "my" list at (eval 42) line 3. Parentheses missing around "my" list at (eval 44) line 3. Parentheses missing around "my" list at (eval 46) line 3. Rate sysread getc readline substr sysread 0.931/s -- -61% -71% -75% getc 2.41/s 159% -- -25% -34% readline 3.20/s 243% 33% -- -13% substr 3.67/s 294% 52% 15% -- 72192 : 72192 : 72192 : 72192 P:\test>test fruit.exe Rate sysread getc readline substr sysread 0.876/s -- -63% -73% -76% getc 2.35/s 168% -- -26% -36% readline 3.20/s 265% 36% -- -13% substr 3.66/s 318% 56% 15% -- 72192 : 72192 : 72192 : 72192 [download] Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail Hooray!	[reply] [d/l] [select]
Re: read data 1 byte at a time by Abigail-II (Bishop) on Oct 28, 2003 at 22:11 UTC
I ran your benchmark and got a few warnings about "Parentheses missing around "my" list at (eval 42) line 3.", which I tracked down to the my $char in sysread test. Strange, I cannot reproduce that. If I check man perldiag, it says: `Parentheses missing around "%s" list (W parenthesis) You said something like my $foo, $bar = @_; when you meant my ($foo, $bar) = @_; Remember that "my", "our", and "local" bind tighter than comma.` [download] but there's no assignment going on here. Note that the sysopen and open lines use the same construct (using our instead of my), and there it doesn't warn. Which version of Perl are you using? For variety, if the file is less than a couple of hundred megs, slurping the whole thing and using substr to iterate the chars is a fair bit quicker. Welllllll, your benchmark isn't fair. All others do two things in the loop, assign a character to $char, and increment an integer. If I change the 'substr' test to: `substr => 'seek $fh4 => 0, SEEK_SET or die; my ($char, $data); sysread $fh4 => $data, -s $file; $c4 = 0; $char = substr $data => $c4 ++, 1 for 0 .. length ($d +ata) - 1'` [download] I get the following results: `Rate sysread getc substr readline sysread 6.48/s -- -54% -60% -63% getc 14.0/s 117% -- -14% -19% substr 16.3/s 152% 16% -- -6% readline 17.4/s 169% 24% 7% -- 72192 : 72192 : 72192 : 72192` [download] Of course, the differences between substr and readline are minimal, and even getc is in the same ballpark. Abigail	[reply] [d/l] [select]
Re: Re: read data 1 byte at a time by BrowserUk (Patriarch) on Oct 28, 2003 at 22:30 UTC
Re: read data 1 byte at a time by Abigail-II (Bishop) on Oct 28, 2003 at 22:42 UTC
Re: read data 1 byte at a time by pg (Canon) on Oct 28, 2003 at 15:55 UTC
Sysread returns 0 at the end of the file. `open(AFILE, "<a.pl"); my $byte; while (sysread(AFILE, $byte, 1)) { print $byte; } close(AFILE);` [download]	[reply] [d/l]
Re: read data 1 byte at a time by zentara (Cardinal) on Oct 28, 2003 at 16:38 UTC
I don't know if it will help you, but don't forget "unpack" for getting binary data. There are many variations: `#!/usr/bin/perl $string = 'once upon a binary bit'; foreach my $byte (unpack "C", $string) { print "$byte\t"; } print "\n";` [download] or `#!/usr/bin/perl use strict; use warnings; $/='undef'; my $in = shift \|\| $0; open (ZZ, "< $in") or die $!; binmode ZZ; my $file=(<ZZ>); my @nums = (unpack "L",$file); print "@nums\n"; close ZZ;` [download]	[reply] [d/l] [select]
Re: read data 1 byte at a time by sweetblood (Prior) on Oct 28, 2003 at 20:43 UTC
Thanks everyone! I have a much better understanding of this. All you suggestions have been helpful and the benchmarks are very revealing.	[reply]