Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello all, I am trying to read a binary file in byte by byte. My code is as follows:
my $offset = 0; my $buffer; open (FILE, "file.binary"); binmode FILE; while ( read(FILE, $buffer, 1, $offset++) ) { ... Code here ... $buffer = ""; }
However, its not working exactly as I thought it would. Instead of $buffer being filled with a single byte value, it will actually be padded with null bytes of the size of the offset. For example: If I have the binary file:

\aa\bb\cc\dd\ee

and I run the above code against it, I'll get the following values for $buffer:

$buffer = \aa

$buffer = \00\bb

$buffer = \00\00\cc

$buffer = \00\00\00\dd

etc.

What I really want is this:

$buffer = \aa

$buffer = \bb

$buffer = \cc

$buffer = \dd

and etc.

What makes this more bizarre, is if I undef $buffer every time, it will still add those padding null bytes. I've tried everything including undef, setting $buffer to "", reinitializing buffer (i.e. read(FILE, my $buffer, 1, $offset++)), but I can't seem to figure this out. Any suggestions?

Replies are listed 'Best First'.
Re: Reading binary file byte by byte
by ELISHEVA (Prior) on Dec 21, 2010 at 13:33 UTC

    The offset parameter refers to where you want Perl to place your data in the buffer, not the offset into the file. The docs say that it will zero pad the buffer if you specify a starting point that is different from 0 and the string has no characters in it already. See read and sysread. That is why you are seeing all those nulls.

    To control the position within the file one needs to use seek, but that's only when one is going back and forth within a file. For the purposes of a simple sequential read, Perl keeps track of the position pointer all on its own and increments it after each read.

    The offset parameter is normally used when you want to append data to a string or have a pipe where you want to make sure that you don't overwrite outgoing data at the front of the string with incoming data placed at the end.

Re: Reading binary file byte by byte
by anonymized user 468275 (Curate) on Dec 21, 2010 at 16:43 UTC
    The implication of not expecting a logical record size other than 1 is that you don't want to be doing fixed length I/O in the first place. There is nothing wrong with doing ordinary variable length I/O even if there is no delimiter. Just set $/ to something likely to occur and process each string of bytes read in singly, e.g.
    { local $/ = chr(0); # if this occurs reasonably often in the file open (FILE, "file.binary"); while ( <FILE> ) { while ( length() ) { my $byte; ( $byte, $_ ) = /^(.)(.*)$/; ... Code here, processing $byte ... } } close FILE; }

    One world, one people

      $/ is more flexible than you think.

      { open(my $FILE, "file.binary") or die $!; binmode($FILE); local $/ = \1; while ( my $byte = <$FILE> ) { ... } close $FILE; }

      But messing with global variables is messy. What if "..." calls a function that reads from a file? Just use read.

      { open(my $FILE, "file.binary") or die $!; binmode($FILE); while (read($FILE, my $byte, 1)) { ... } close $FILE; }
        Then we are back to reading a byte at a time from a file, which needs some thought applied first in terms of performance. It might be better than sysread which forces a system call with a 1 byte buffer, if thats what you request.

        Either way, if using buffered I/O, it seems more ept to read at least a page of memory in size at a time from the file but then process that one byte at a time.

        One world, one people

      Another way, using fixed-length I/O (and also avoiding use of a regex to step through the string byte-by-byte):

      >perl -wMstrict -le "{ local $/ = \10; my $fname = 'junk'; open my $fh, '<:raw', $fname or die qq{opening '$fname': $!}; while (defined(my $buffer = <$fh>)) { printf '%02x ', ord substr $buffer, $_, 1 for 0 .. length($buffer) - 1; print ''; } close $fh; } " 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f 70 71 72 73 74 75 76 77 78 79 7a 0d 0a

      Update: BTW: The test file was created with
          perl -e "print 'a' .. 'z', qq{\n}" > junk