sweetblood has asked for the wisdom of the Perl Monks concerning the following question:

I'm writing a script that needs to process it's data 1 byte at a time. First I used sysread and a for loop to step thru the data, but then I realised I would also need to take the input from stdin. Here's what I started out with:

#!/usr/bin/perl -w use strict; use Fcntl; sysopen(FH, "data.sg", O_RDONLY) or die "$!\n"; my $flen = (-s "data.sg"); for (my $i=0;$i<$flen;$i++){ sysread(FH,my $byte,1,$i); # do stuff to $byte }

Now if I take this approach I won't be able to use stdin as a data path as I won't know the legnth of the file. I thought if I set $/ = ""; that would give me "byte mode" but no dice. I'd like me finished script to read either a named file or stdin. Is there a way to put Perl into "byte mode" as undef $/ puts Perl into "Slurp mode"?

Thanks in advance

By the way this is an HP-UX system running Perl 5.6.1

Replies are listed 'Best First'.
Re: read data 1 byte at a time
by jmcnamara (Monsignor) on Oct 28, 2003 at 14:32 UTC

    You can set $/ = \1 to read 1 byte at a time, see $/ in perlvar.
    #!/usr/bin/perl -wl $/ = \1; open FILE, "file" or die "Error message here: $!"; while (<FILE>) { print; }

    --
    John.

Re: read data 1 byte at a time
by Anonymous Monk on Oct 28, 2003 at 14:34 UTC
    Check out getc for reading one character at a time. If your STDIN needs to read the keyboard, see also Term::ReadKey.
    while(my $char = getc){ print "$char\n"; } __END__
Re: read data 1 byte at a time
by Abigail-II (Bishop) on Oct 28, 2003 at 17:02 UTC
    Three ways were suggested: using sysread, setting $/ and using getc. So, one may wonder which one is the fastest. Here is a Benchmark. Note that one should realize the presented results are for my system, with my Perl only. Running this with other Perls, or on other systems may give different results. Always run the Benchmark on your own system before drawing any conclusions, especially when it involves I/O.
    #!/usr/bin/perl use strict; use warnings; use Benchmark qw /cmpthese/; use Fcntl qw /:DEFAULT :seek/; my $file = "/etc/passwd"; sysopen our $fh1 => $file, O_RDONLY or die $!; open our $fh2 => $file or die $!; open our $fh3 => $file or die $!; our ($c1, $c2, $c3) = (0, 0, 0); cmpthese -10 => { sysread => 'sysseek $fh1 => 0, SEEK_SET or die; $c1 = 0; $c1 ++ while sysread $fh1 => my $char, 1', readline => 'seek $fh2 => 0, SEEK_SET or die; local $/ = \1; $c2 = 0; $c2 ++ while defined (my $char = <$fh2>)', getc => 'seek $fh3 => 0, SEEK_SET or die; $c3 = 0; $c3 ++ while defined (my $char = getc $fh3)', }; die '$c1 is empty' unless $c1; die '$c2 is empty' unless $c2; die '$c3 is empty' unless $c3; die 'Unequal' unless $c1 == $c2 && $c2 == $c3; __END__ Rate sysread getc readline sysread 642/s -- -49% -52% getc 1249/s 94% -- -7% readline 1338/s 108% 7% --

      Curious. I ran your benchmark and got a few warnings about "Parentheses missing around "my" list at (eval 42) line 3.", which I tracked down to the my $char in sysread test. So I stuck in the brackets expecting the absence of the warning generation to speed that case up a tad. To my surprise it doesn't. It slows it down a tad. It's consistant, but I can't explain why.

      For variety, if the file is less than a couple of hundred megs, slurping the whole thing and using substr to iterate the chars is a fair bit quicker.

      #!/usr/bin/perl #! perl -slw use strict; use Benchmark qw /cmpthese/; use Fcntl qw /:DEFAULT :seek/; our $file = $ARGV[0]; sysopen our $fh1 => $file, O_RDONLY or die $!; binmode $fh1; open our $fh2 => $file or die $!; binmode $fh2; open our $fh3 => $file or die $!; binmode $fh3; sysopen our $fh4 => $file, O_RDONLY or die $!; binmode $fh4; our ($c1, $c2, $c3, $c4) = (0, 0, 0, 0); cmpthese -5 => { sysread => 'sysseek $fh1 => 0, SEEK_SET or die; $c1 = 0; $c1 ++ while sysread $fh1 => my( $char ), 1', readline => 'seek $fh2 => 0, SEEK_SET or die; local $/ = \1; $c2 = 0; $c2 ++ while defined (my $char = <$fh2>)', getc => 'seek $fh3 => 0, SEEK_SET or die; $c3 = 0; $c3 ++ while defined (my $char = getc $fh3)', substr => 'seek $fh4 => 0, SEEK_SET or die; my( $char, $data ); sysread( $fh4, $data, -s $file ); $c4 = length( $data ); $char = substr( $data, $_, 1 ) for 0 .. $c4', }; print "$c1 : $c2 : $c3 : $c4"; die '$c1 is empty' unless $c1; die '$c2 is empty' unless $c2; die '$c3 is empty' unless $c3; die '$c4 is empty' unless $c4; die 'Unequal' unless $c1 == $c2 && $c2 == $c3 && $c3 == $c4; __END__ P:\test>test fruit.exe Parentheses missing around "my" list at (eval 42) line 3. Parentheses missing around "my" list at (eval 44) line 3. Parentheses missing around "my" list at (eval 46) line 3. Rate sysread getc readline substr sysread 0.931/s -- -61% -71% -75% getc 2.41/s 159% -- -25% -34% readline 3.20/s 243% 33% -- -13% substr 3.67/s 294% 52% 15% -- 72192 : 72192 : 72192 : 72192 P:\test>test fruit.exe Rate sysread getc readline substr sysread 0.876/s -- -63% -73% -76% getc 2.35/s 168% -- -26% -36% readline 3.20/s 265% 36% -- -13% substr 3.66/s 318% 56% 15% -- 72192 : 72192 : 72192 : 72192

      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      Hooray!

        I ran your benchmark and got a few warnings about "Parentheses missing around "my" list at (eval 42) line 3.", which I tracked down to the my $char in sysread test.
        Strange, I cannot reproduce that. If I check man perldiag, it says:
        Parentheses missing around "%s" list (W parenthesis) You said something like my $foo, $bar = @_; when you meant my ($foo, $bar) = @_; Remember that "my", "our", and "local" bind tighter than comma.
        but there's no assignment going on here. Note that the sysopen and open lines use the same construct (using our instead of my), and there it doesn't warn. Which version of Perl are you using?
        For variety, if the file is less than a couple of hundred megs, slurping the whole thing and using substr to iterate the chars is a fair bit quicker.
        Welllllll, your benchmark isn't fair. All others do two things in the loop, assign a character to $char, and increment an integer. If I change the 'substr' test to:
        substr => 'seek $fh4 => 0, SEEK_SET or die; my ($char, $data); sysread $fh4 => $data, -s $file; $c4 = 0; $char = substr $data => $c4 ++, 1 for 0 .. length ($d +ata) - 1'
        I get the following results:
        Rate sysread getc substr readline sysread 6.48/s -- -54% -60% -63% getc 14.0/s 117% -- -14% -19% substr 16.3/s 152% 16% -- -6% readline 17.4/s 169% 24% 7% -- 72192 : 72192 : 72192 : 72192
        Of course, the differences between substr and readline are minimal, and even getc is in the same ballpark.

        Abigail

Re: read data 1 byte at a time
by pg (Canon) on Oct 28, 2003 at 15:55 UTC

    Sysread returns 0 at the end of the file.

    open(AFILE, "<a.pl"); my $byte; while (sysread(AFILE, $byte, 1)) { print $byte; } close(AFILE);
Re: read data 1 byte at a time
by zentara (Cardinal) on Oct 28, 2003 at 16:38 UTC
    I don't know if it will help you, but don't forget "unpack" for getting binary data. There are many variations:
    #!/usr/bin/perl $string = 'once upon a binary bit'; foreach my $byte (unpack "C*", $string) { print "$byte\t"; } print "\n";

    or

    #!/usr/bin/perl use strict; use warnings; $/='undef'; my $in = shift || $0; open (ZZ, "< $in") or die $!; binmode ZZ; my $file=(<ZZ>); my @nums = (unpack "L*",$file); print "@nums\n"; close ZZ;
Re: read data 1 byte at a time
by sweetblood (Prior) on Oct 28, 2003 at 20:43 UTC
    Thanks everyone! I have a much better understanding of this. All you suggestions have been helpful and the benchmarks are very revealing.