in reply to yet another "reading binary data" question

Are the quantities x, y, z and n known beforehand? You can do most of this with a single unpack call. Some of the values might need to be fixed up after its parsed.

Here's probably how I would go about it:

my ($x, $y, $z, $n, $len) = ... # fill in these parameters my $format = "V$x n$y n C$z ".("Z$len" x $n); my $recsize = 4*$x + 2*$y + 2 + $z + ($n*$len); open B, '<', 'binary_file.bin' or die ...; while (read(B, $buf, $recsize) == $recsize) { my @values = unpack($format, $buf); my @x = splice(@values, 0, $x); # snip off the 4-byte ints my @y = splice(@values, 0, $y); # the 2-byte ints my $bits = splice(@values, 0, 1); my @z = splice(@values, 0, $z); # the 1-byte ints # @values now contain just the null-terminated strings # manually fix up signed vs. unsigned shorts in @y: $y[1] -= 65536 if ($y[1] > 32767); # repeat for each signed valued ...process the record... }
Admittedly, fixing up the shorts is a kludge. Unfortunately, there doesn't seem to be a format for big-endian signed values, so that's why I opted for this approach. I like to use the platform-independent formats n and v for shorts as opposed to the platform-dependent formats s and S to make the code, well, platform-independent.

Update: Fixed format based on alexm's comment.

Replies are listed 'Best First'.
Re^2: yet another "reading binary data" question
by alexm (Chaplain) on May 07, 2008 at 19:46 UTC
    my $format = "v$x n$y n C$z ".("Z$len" x $n); my $recsize = 2*$x + 2*$y + 2 + $z + ($n*$len);
    Shouldn't it be $format = "V$x ..." and $recsize = 4*$x + ... ?
      Yeah, you're right - got too preoccupied with the shorts and forgot about the longs.
Re^2: yet another "reading binary data" question
by dwalin (Monk) on May 07, 2008 at 19:37 UTC
    yes, indeed the record format is known beforehand. it is fixed and all field quantities and overall record length is known. in fact, already i have something resembling your code however slightly more generalized. but it doesn't really matter if i can't use one unpack for them all, one kludge or two is still kludgy code. i'm not after effectiveness this time, more like beauty. :)
    thanks for the reply, anyway.