in reply to Converting Files

Courtney, the key to solving your problem is to understand the format of the binary file. If it's a common numerical analysis format, you just might get lucky and find a module on CPAN that will do the work for you.

If not, then you'll need to read the file in and deal with it byte by byte.

#!/usr/bin/perl use warnings; use strict; my $buf; my @barray; my $byt; my $acc = 0; # file 'datafile.bin' must exist and be in same directory as program f +ile. open BFILE, "<./datafile.bin" or die "Can't find or open binary data f +ile.\n"; # not necessary on UNIX but helps with line-delineated files on Doze. binmode BFILE; # program will create 'tfile.csv' in same directory open OFILE, ">./tfile.csv" or die "Can't create output file.\n"; while ( read BFILE, $buf, 64 ) { my $bindex = 0; @barray = unpack('C*', $buf); for ( my $batch = 0; $batch < 16; $batch++ ) { $acc = 0; for ( my $i = 0; $i < 4; $i++ ) { if ( exists( $barray[ $bindex ] ) ) { $byt = $barray[ $bindex++ ]; $acc *= 256; $acc += $byt; } } printf OFILE '"' . $acc . "\","; } printf OFILE "\n"; } close BFILE; close OFILE;
I didn't fully test it beyond sanity check, and if you use this in production, you'll want to add little niceties like not writing the very last comma and ignoring null bytes, but this should take each four bytes of the file, convert them to an integer, and write them out in groups of 16 per line. A Windows-based system might have a little different invocation; this is done on Linux:

din@foobar $ emacs tfile.pl & din@foobar $ chmod +x tfile.pl din@foobar $ ./tfile.pl din@foobar $


Good luck, and welcome to the Monastery! :D

UPDATE: added caveat about null bytes, and comments. UPDATE2: Changed $b to $byt (as per Albannach) so as not to use special sorting variable $b.

Don Wilde
"There's more than one level to any answer."

Replies are listed 'Best First'.
Re^2: Converting Files
by Grundle (Scribe) on May 30, 2007 at 15:05 UTC
    You are exactly right! There may be some special cases that will throw your

    unpack('C*', $buf);

    off. Mainly, when you have a binary file there can be BCD (Binary Coded Decimal) contained within. This would generally be used for a float data type (money or fractional data). There also may be encoded date fields (generally they consist of 2 bytes).

    Also watch out for multiple byte hex numbers. i.e. you have 2 bytes 0x01 and 0x02, but a multiple byte hex number will append those two and convert to decimal. (0001 0010) == (18)

    Finally some binary files will contain pointers (a.k.a memory locations) to other locations in the same file or other files. If this is the case you will need to be able to handle that pointer so that you do not lose any data.

    I would suggest first, parsing the file as samizdat has suggested. Most likely you will find garbled fields after you are done. These garbled fields will probably fall into one of the categories I have mentioned. From there on out, you become Sherlock Holmes and try to determine what each one of them is.

    Some tools to help you look at the binary data are hexdump or a combination of dd + hexdump (if you know the block-size of each record).

    i.e. (dd if=binaryfile bs=blocksize | hexdump -c | more)

    If you are using a *nix variant that is. If you do not have a *nix OS, then install cygWin and use it for these command line tools.