Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Riddle me this my Monkish friends.

I am currently trying to write a perl script to parse a binary file. No problem using unpack(still trying to gernerate all the templates, but not the issue at hand.)

The file is made of of records, each record is made up of a header and data portion. Within the header the length of the data portion is identified. No problem see below

The program reads the header, calculates the the size of the remaining data then reads that information

The problem that I run into is that the system seems to read twice the size I am requesting. i.e. My first read is bytes 0-20, assuming the data portion is 100 bytes it should read 21-120. But instead it reads 0-20, skips 21-40, reads 41-140, skips 141-240, read the next header.

Any Ideas on this. I have tried both read and sysread and I get slightly different results but neither work properly.

System:
Win NT
ActiveState Perl

Thanks

Replies are listed 'Best First'.
Re: Reading Binary file confusion
by jmcnamara (Monsignor) on Mar 11, 2002 at 22:17 UTC

    I think your template may not be right. You read 20 bytes with your first read, however, the template only unpacks 11 bytes of data.
    print length pack 'H4C3H4b12H4'; # prints 11 H4 : 4 nybbles = 2 bytes C3 : 3 chars = 3 bytes H4 : 4 nybbles = 2 bytes b12: 12 bits!! = 2 bytes H4 : 4 nybbles = 2 bytes -- 11 bytes

    But perhaps you wish to ignore the other 9 bytes in the header.

    Unpacking the third field with "H4" gives you a hex string so you should probably use hex() to convert it to an integer. However, if it is a count of the blocks to follow then you could more easily unpack it with "v" or "n" depending on the endianness.

    Also in your debug code you can use "H*" as a template for an unknown number of bytes.

    --
    John.

      You are right, my template is off, but I will have to clarify this. However it seems to work out when it checks the header ID, and block size. The other 9 bytes may have been held for future development. I have to tramp through the history for this.

      For the third field I use "H4" because the data returned properly. I don't believe the number of blocks will never exceed 8 but I guess I should use the proper form anyway.

      Thanks for the tip on H*, this is my first attempt at Pack and unpack so my steps are a little unsteady still.
Re: Reading Binary file confusion
by Anonymous Monk on Mar 11, 2002 at 21:23 UTC
    Sorry here's some example code
    #! c:/perl/bin/perl; use strict; use warnings; #======================= # Define VARS # ====================== my $RECORD_SIZE = 40; my $ID = 0; my $size = 20; # Num of Bytes to read from fi +le my $offset = 0; # Offset to file reading. my $MASK = 'H4C3H4b12H4'; my $file = "FileName"; open( FILE, $file ) || die print "Can not open file"; # open file binmode FILE; # Turn on Binary mode # Read Header for the record. while ( sysread( FILE, $data, $size) ){ #Filehandle,str +ing,length,offset my ($flag,$format,$blocks,$source,$no,$time,$type) = unpack $MASK, + $data; print "Header: $flag\n"; print "Header Data:",unpack $MASK, $data; print "\n"; # Check integity of the header. if ( $flag != 1234 ) { die print "Ill formed header. File may be corrupt."; # Read the remainder of the record. $size = ( ( $RECORD_SIZE * $blocks ) - 20 ); sysread( FILE, $data, $size ); #Filehandle,string,length,offset # Debugging stuff to match Hex editor to file to determine what is goi +ng on. my $long = length($data); print "Size of Data portion of record: $long \n"; my $temp = "H" . $long; #print $temp; my $alldata = unpack $temp, $data; print "Data: $alldata\n"; } close FILE;
      I think your header is a fixed size? It reads correctly for the first header, but then you change $size in the loop for the data portion. For the next header, $size is incorrect, having the size of the last data portion. I quit looking when I saw this, it may or may not fix all your problems.

      YuckFoo

        I have serious Egg on my face. YuckFoo is correct, after pulling my data I did not reset my size back to the size of the header. Once I did this my sample script ran through the entire bin file checking the header Ids and made it through the file no problem. I will still have to go back and verify this but my problem seems to be solved. Thank you all!!!

        As a side note, I have always been impressed with the fact that no matter what the question may be, or in my case how dense the writer of the question, someone always answers whatever gets posted. This may not always be the case but it has been for me. Thanks you all. Because it is this type of support that makes this site what it is, one of my favorites