Hello Perl Monks,

I am working on an automation project where I have to read a binary file, parse it and print it out in a readable format. The binary file can be up to 4-5 mb big and can contain around 10,000 records. Each record is separated by a 2 byte eye catcher which is == (or 3d3d in hex). I have the template which tells me the length of the fields in the record. I mentioned variable length records due to the last field in the record which can vary from record to record. The good thing is right after the 2 bytes eye catchers is the 2 byte length of the record.

So my approach as of now is, to read first two bytes of the binary file, check if it is my eye catcher, if it is, read the next two bytes which tells me the length of my record. Convert that length in to decimal and use it in read function to read entire record and pass it to parsing subroutine which will split and convert each field of record accordingly. Length of the buffer includes the eye catcher which is why I am doing $length - 4 (2 bytes eye catcher and 2 bytes length already read into buffer)

#!/usr/local/bin/perl use strict; use warnings; open my $fh, '<', 'binary.file' or die "File not found: $!"; binmode($fh); my ($xdr, $buffer, $length) = ""; # read until end of file ... while (read ($fh, $buffer, 2) != 0) { # if file does not start with eye catcher skip until you find one next unless ((unpack 'H*', $buffer) eq "3d3d"); # append the eyecatcher to xdr variable $xdr .= $buffer; # read next two bytes which is length of the record read ($fh, $buffer, 2); # convert the binary length to decimal for use in read function my $length = unpack('s',pack 's', hex(unpack 'H*', $buffer)); # append the length to xdr variable $xdr .= $buffer; # read the binary stream till the length of record - 4 bytes read ($fh, $buffer, $length-4); # append the entire xdr $xdr .= $buffer; #send to parsing subroutine for parsing $xdr = ""; }

My questions are:
1. Is this a good approach? How can I improve this?
2. Will it be better to read the entire binary file in to an array splitting at eye catcher. Will it be a performance hit to read entire file in array?
3. There can be bad records in the file where length could be wrong so I need to put a check that only send the record for parsing if after the entire length of record is read the next two bytes are 3d3d.

If there is anything ambiguous I may have quoted please let me know in the comments and I will update this question to be more clear. I don't have any questions in parsing yet, it is just the reading I am most concerned about. </p?

Looking forward to your wisdom.
Regards
Jaypal


In reply to Reading binary file in perl having records of different length by jaypal

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.