bh_perl has asked for the wisdom of the Perl Monks concerning the following question:

Hi..
I want to read a binary file and the binary files have File Header, data and File Trailer. File Header, File Trailer and Data is fix length.
But I have a problem to capture the File Trailer. Could somebody help me How could i read the files by following the file format ?

#!/usr/bin/perl -w use POSIX qw(strftime); use constant FILEHDR => 66; use constant DATALEN => 222; use constant FILETRL => 69; ... ... sub print_header { my $data = shift; my $loc = 0; my $nloc = 0; my $filehdr = unpack "A*", $data; my $fileval = unpack "H*", $data; print ("FILE HEADER\n"); for ($i = 0; $i < @hdrname; $i++) { my $len = $hdrsize[$i]; if ($hdrname[$i] =~ /newLine/) { printf (" %-20s : %-15s >%s<\n", $hdrname[$i +], substr($fileval,$nloc,$len*2), substr($fileval, $nloc, $len*2)); } else { printf (" %-20s : %-15s >%s<\n", $hdrname[$i +], substr($filehdr, $loc, $len), substr($fileval, $nloc, $len*2)); } $loc = $loc + $len; $nloc = $nloc + $len * 2; } } sub print_trailer { my $data = shift; my $loc = 0; my $nloc = 0; my $trlhdr = unpack "A*", $data; my $trlval = unpack "H*", $data; print ("FILE HEADER\n"); for ($i = 0; $i < @trlname; $i++) { my $len = $trlsize[$i]; if ($trlname[$i] =~ /newLine/) { printf (" %-20s : %-15s >%s<\n", $trlname[$i +], substr($trlval,$nloc,$len*2), substr($trlval, $nloc, $len*2)); } else { printf (" %-20s : %-15s >%s<\n", $trlname[$i +], substr($trlval, $loc, $len), substr($trlval, $nloc, $len*2)); } $loc = $loc + $len; $nloc = $nloc + $len * 2; } } sub display_data { my $data = shift; my $val; my $tmp; my $cdr; my $loc = 0; print ("\nDATA\n") if ($raw); for ($i = 0; $i < @dataname; $i++) { my $len = $datasize[$i]; my $fname = $dataname[$i]; if (uc($fname) =~ /RECORD TYPE|CANCEL TYPE/) { $cdr = hextoasc (substr($data, $loc, $len)); $val = substr($data, $loc, $len); } elsif (uc($fname) =~ /RECORD NUMBER/) { $cdr = hextoasc (substr($data, $loc, $len)); $val = substr($data, $loc, $len); } else { $cdr = hextoasc (substr($data, $loc, $len)); $val = substr($data, $loc, $len); } $cdr =~ s/\s+//g; $val =~ s/\s+//g; if ($raw) { printf (" %-20s: %-25s >%s<\n", $fname, $cdr +, $val); } elsif ($std) { printf (" %-20s: %-25s\n", $fname, $cdr); } else { if ($output) { printf (OUTPUT "%s,", $cdr); } else { printf ("%s,", $cdr); } } $loc = $loc + $len; if ($output) { print (OUTPUT "\n"); } else { print "\n"; } } if ($fname) { if (!defined ($input)) { $input = getcwd; } if (defined ($output)) { open (OUTPUT, ">$output/$fname.csv"); } open (DATA, "$input/$fname"); binmode DATA; read (DATA, $data, FILEHDR); print_header($data) if ($raw); while (read (DATA, $data, DATALEN) != 0) { my $hex = unpack "H*", $data; $rtype = hextoasc (substr($hex, 0, 2)); if ($rtype =~ /T/) { print_trailer($hex) if ($raw); } else { display_data($hex); } } close(DATA); close(OUTPUT) if ($output); } else { usage(); }
  • Comment on How to read a binary file with file header, data and file trailer ?
  • Download Code

Replies are listed 'Best First'.
Re: How to read a binary file with file header, data and file trailer ?
by BrowserUk (Patriarch) on Feb 03, 2010 at 09:39 UTC

    From your code, your files have a a 66-byte header; 0 or more 222-byte data sections; and a 69-byte trailer. And you know when you've read the trailer record because the first byte is 'T'.

    But, you unpack the data to hex before checking if the first 2 bytes are the hex for 'T'.

    If it is, you then passed the hexified data into print_trailer(), where you unpack it again. Twice! Once as TEXT ('A*'); once as HEX ('H*') again!

    You then attempt to process those, conditionally upon a bunch of mystery variables, sometimes accessing the ASCIIfied hex; and sometimes the HEXified hex. I suspect that at the very least you should be passing the raw data (not its hexified representation) into print_trailer().

    And that then calls into question, why are you hexifying it in the first place, if all you want to do is test if the first character is 'T'? A simple if( $data =~/^T/ ) { ... would do that nicely, and save you having to substr 2 bytes of the hex and then convert those back to ascii.

    And you're doing yourself no favours by using variables (package globals) within your subs that are not passed into them via the parameters. And by not showing the full script, or at least a runnable cut-down of it, you make life neigh impossible for us to help you beyond the above observations.

    That's why I prefer to put my subs at the very top of my scripts where it's impossible to accidentally create closures.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Hi..
      I am sorry because not produced my full script. This is my mistake and I am very sorry. Below are my full script..

      #!/usr/bin/perl -w use strict; use warnings; use POSIX qw(strftime); use constant FILEHDR => 66; use constant DATALEN => 222; use constant FILETRL => 69; use Cwd; use Getopt::Long; my @hdrname = ( "Record Type", "Date", "Record Size", "Number Record", "Application", "Version", "Data Rec. Type", "Extra", "Byte", "newLine" ); my @hdrsize = (3, 14, 4, 8, 16, 5, 3, 8, 4, 2); my $data; my @trlname = ( "Record Type", "Date", "Record Size", "Number Record", "Application", "Version", "Data Rec. Type", "Extra", "newLine" ); my @trlsize = (6, 28, 8, 14, 32, 10, 6, 32, 2); my @dataname = ( "Record Type", "IRVC Key", "Card Number", + "Serial Number", "Issue Code", "Lot Number", "Source Zone", + "Destination Zone", "Caller Number", "Dialed Number", "Convention Ti +me", "Call connect Time", "Call Release Time", "Valued Deducted", "Breakage Valu +e", "MBI", "Carrier", "Carriage Return" ); my @datasize = (4, 24, 32, 32, 32, 16, 32, 32, 40, 40, 24, 28, 28, 18, 18, 10, 32, 2 ); my ($help,$raw,$std,$input,$output,$fname); GetOptions ( "h|help" => \$help, "raw|r" => \$raw, "std|s" => \$std, "input|i=s" => \$input, "output|o=s" => \$output, "filename|f=s" => \$fname, ) or usage(); sub usage { if ($help) { open (HELP, ">/tmp/help.txt"); print (HELP "USAGE: $0 -[hrs] -i <input dir> -o <outpu +t dir> -f <input_file>\n"); print (HELP "Try \"-h\" or \"--help\" for more informa +tion\n\n"); print (HELP "NAME\n"); print (HELP "\t$0 - encode SINGTEL Data Format\n\n"); print (HELP "DESCRIPTION\n"); print (HELP "\t-f or --filename : Assigned input file +name\n"); print (HELP "\t-i or --input : Assigned input direc +tory for the input file\n"); print (HELP "\t-o or --output : Assigned output dire +ctory\n"); print (HELP "\t-r : Display raw record\n +"); print (HELP "\t-s : Display standard rec +ord\n\n"); print (HELP "EXAMPLE\n"); print (HELP "\t>> Display SINGTEL record in standard o +utput\n"); print (HELP "\t\t $0 -s -f UNICA.0352.20091214\n\n"); print (HELP "\t>> Display SINGTEL record including raw + record\n"); print (HELP "\t\t $0 -r -f UNICA.0352.20091214\n\n"); print (HELP "\t>> Convert SINGTEL record into CSV file + format\n"); print (HELP "\t\t $0 -f UNICA.0352.20091214\n\n"); print (HELP "\t>> Convert SINGTEL record into CSV form +at and save into output directory\n"); print (HELP "\t\t $0 -i /input/directory/name/only -f +UNICA.0352.20091214 -o /output/directory/name/only\n\n"); close (HELP); system ("more /tmp/help.txt"); system ("rm -f /tmp/help.txt"); } else { print ("USAGE: $0 -[hrs] -i <input dir> -o <output dir +> -f <input_file>\n"); print ("Try \"-h\" or \"--help\" for more information\ +n\n"); } } # Convert each two-digit hex number back to an ASCII character. sub hextoasc { (my $asc = shift) =~ s/([a-fA-F0-9]{2})/chr(hex $1)/eg; return $asc; } sub print_header { my $data = shift; my $loc = 0; my $nloc = 0; my $filehdr = unpack "A*", $data; my $fileval = unpack "H*", $data; print ("FILE HEADER\n"); for (my $i = 0; $i < @hdrname; $i++) { my $len = $hdrsize[$i]; if ($hdrname[$i] =~ /newLine/) { printf (" %-20s : %-15s >%s<\n", $hdrname[$i +], substr($fileval,$nloc,$len*2), substr($fileval, $nloc, $len*2)); } else { printf (" %-20s : %-15s >%s<\n", $hdrname[$i +], substr($filehdr, $loc, $len), substr($fileval, $nloc, $len*2)); } $loc = $loc + $len; $nloc = $nloc + $len * 2; } } sub print_trailer { my $data = shift; my $loc = 0; my $trlhdr = unpack "A*", $data; my $trlval = unpack "H*", $data; print ("FILE HEADER\n"); for (my $i = 0; $i < @trlname; $i++) { my $len = $trlsize[$i]; if ($trlname[$i] =~ /newLine/) { printf (" %-20s : %-15s >%s<\n", $trlname[$i +], substr($trlval,$loc,$len*2), substr($trlval, $loc, $len)); } else { printf (" %-20s : %-15s >%s<\n", $trlname[$i +], hextoasc(substr($trlhdr, $loc, $len)), substr($trlhdr, $loc, $len) +); } $loc = $loc + $len; } } sub display_data { my $data = shift; my $val; my $tmp; my $cdr; my $loc = 0; print ("\nDATA\n") if ($raw); for (my $i = 0; $i < @dataname; $i++) { my $len = $datasize[$i]; my $fname = $dataname[$i]; if (uc($fname) =~ /RECORD TYPE|CANCEL TYPE/) { $cdr = hextoasc (substr($data, $loc, $len)); $val = substr($data, $loc, $len); } elsif (uc($fname) =~ /RECORD NUMBER/) { $cdr = hextoasc (substr($data, $loc, $len)); $val = substr($data, $loc, $len); } else { $cdr = hextoasc (substr($data, $loc, $len)); $val = substr($data, $loc, $len); } $cdr =~ s/\s+//g; $val =~ s/\s+//g; if ($raw) { printf (" %-20s: %-25s >%s<\n", $fname, $cdr +, $val); } elsif ($std) { printf (" %-20s: %-25s\n", $fname, $cdr); } else { if ($output) { printf (OUTPUT "%s,", $cdr); } else { printf ("%s,", $cdr); } } $loc = $loc + $len; } if ($output) { print (OUTPUT "\n"); } else { print "\n"; } } #################################### MAIN ############################ # $dir = getcwd; if ($fname) { if (!defined ($input)) { $input = getcwd; } if (defined ($output)) { open (OUTPUT, ">$output/$fname.csv"); } open (DATA, "$input/$fname"); binmode DATA; read (DATA, $data, FILEHDR); print_header($data) if ($raw); while (read (DATA, $data, DATALEN) != 0) { my $hex = unpack "H*", $data; my $rtype = hextoasc (substr($hex, 0, 2)); if ($rtype =~ /T/) { print_trailer($hex) if ($raw); } else { display_data($hex); } } close(DATA); close(OUTPUT) if ($output); } else { usage(); }

        All of my other observations still apply! In particular, your unpacking of the trailer three times means that you are ending up with garbage.

        For example, one of your trailer fields is called 'Date' and is 28 characters long. Reading between the lines, I'm going to assume that this is both the date & time, and it is being stored in the format "YYYY/MM/DD HH:MM:SS.ssssss'. That's a a couple short, so it obviously the wrong format, but that doesn't matter for the purposes of this demonstration. Lets see what happens to that dat when you unpack 3 times as you are doing:

        # The raw data as you might read it fro the file $raw = '2010/02/03 10:52:03.123456'; print length $raw;; 26 ## First unpack in your main loop $level1hex = unpack 'H*', $raw; print $level1hex;; 323031302f30322f30332031303a35323a30332e313233343536 ## Now re-unpack it two different ways in you print_trailer() sub $level2text = unpack 'A*', $level1hex; print $level2text;; 323031302f30322f30332031303a35323a30332e313233343536 $level2hex = unpack 'H*', $level1hex; print $level2hex;; 3332333033313330326633303332326633303333323033313330336133353332336133 +3033333265333133323333333433353336

        As you can see, you're ending up with garbage.

        The first thing you need to do is understand what unpack does, with it various templates. As another small example, your code defines the trailer as

        my @trlname = ( "Record Type", "Date", "Record Size", "Number Record", "Application", "Version", "Data Rec. Type", "Extra", "newLine" ); my @trlsize = (6, 28, 8, 14, 32, 10, 6, 32, 2);

        And the first thing to say about that, is that this isn't "binary" data. But rather just ascii (text) data in fixed length fields. (Probably from some COBOL system?)

        The clues here are that an 8-byte (64-bit) binary field can store dates covering 500,000 years in microseconds. So a 224-bit binary date field could cover the entire history of the Universe since the Big Bang in femtoseconds--and then some. I heard of forward planning, but that's unlikely to be a date stored in binary.

        Equally, an 8-byte binary "Record size" field would allow for single records upto 18446744073709551616 bytes each. And that is (probably) more storage than all the disk drives, CDs, DVDs & BluRays every manufactured could hold. For each RECORD, I think not :)

        The same goes for the recognisible fields, so the probability is that they are all just fixed length ASCII fields. In other words, to display them all you really need to do is pass $data (not $hex) into your print_trailer() sub and then just print $data And the results would be human readable. (Though possibly not nicely formatted!)

        If you really want to unpack the fields, a simple:

        my @fields = unpack 'A6 A28 A8 A14 A32 A10 A6 A32 A2', $data; print $trlname[ $_ ], ' : ', $fileds[ $_ ] for 0 .. $#fields;

        Is probably all you need.

        Ie. (Untest modifications; I don't have an appropriate data file):

        sub print_trailer { my $data = shift; my @fields = unpack 'A6 A28 A8 A14 A32 A10 A6 A32 A2', $data; print $trlname[ $_ ], ' : ', $fields[ $_ ] for 0 .. $#fields; } ... while (read (DATA, $data, DATALEN) != 0) { if ($data =~ /^T/) { print_trailer($data) if ($raw); } else { display_data($hex); } }

        Note:I haven't attempted to examine display_data($hex) in detail, but similar modifications would be required there.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: How to read a binary file with file header, data and file trailer ?
by Anonymous Monk on Feb 03, 2010 at 09:11 UTC
    You're missing the binary file format description... You're also missing error checking (see autodie) ... you're probably also missing strict and warnings from your real code

      Hi
      Thanks for your reply, I have updated use strict and warning to my script.