jaypal has asked for the wisdom of the Perl Monks concerning the following question:
Hello Perl Monks,
I am working on an automation project where I have to read a binary file, parse it and print it out in a readable format. The binary file can be up to 4-5 mb big and can contain around 10,000 records. Each record is separated by a 2 byte eye catcher which is == (or 3d3d in hex). I have the template which tells me the length of the fields in the record. I mentioned variable length records due to the last field in the record which can vary from record to record. The good thing is right after the 2 bytes eye catchers is the 2 byte length of the record.
So my approach as of now is, to read first two bytes of the binary file, check if it is my eye catcher, if it is, read the next two bytes which tells me the length of my record. Convert that length in to decimal and use it in read function to read entire record and pass it to parsing subroutine which will split and convert each field of record accordingly. Length of the buffer includes the eye catcher which is why I am doing $length - 4 (2 bytes eye catcher and 2 bytes length already read into buffer)
#!/usr/local/bin/perl use strict; use warnings; open my $fh, '<', 'binary.file' or die "File not found: $!"; binmode($fh); my ($xdr, $buffer, $length) = ""; # read until end of file ... while (read ($fh, $buffer, 2) != 0) { # if file does not start with eye catcher skip until you find one next unless ((unpack 'H*', $buffer) eq "3d3d"); # append the eyecatcher to xdr variable $xdr .= $buffer; # read next two bytes which is length of the record read ($fh, $buffer, 2); # convert the binary length to decimal for use in read function my $length = unpack('s',pack 's', hex(unpack 'H*', $buffer)); # append the length to xdr variable $xdr .= $buffer; # read the binary stream till the length of record - 4 bytes read ($fh, $buffer, $length-4); # append the entire xdr $xdr .= $buffer; #send to parsing subroutine for parsing $xdr = ""; }
My questions are:
1. Is this a good approach? How can I improve this?
2. Will it be better to read the entire binary file in to an array splitting at eye catcher. Will it be a performance hit to read entire file in array?
3. There can be bad records in the file where length could be wrong so I need to put a check that only send the record for parsing if after the entire length of record is read the next two bytes are 3d3d.
If there is anything ambiguous I may have quoted please let me know in the comments and I will update this question to be more clear. I don't have any questions in parsing yet, it is just the reading I am most concerned about. </p?
Looking forward to your wisdom.
Regards
Jaypal
|
|---|