You are exactly right! There may be some special cases that will throw your
unpack('C*', $buf);
off. Mainly, when you have a binary file there can be BCD (Binary Coded Decimal) contained within. This would generally be used for a float data type (money or fractional data). There also may be encoded date fields (generally they consist of 2 bytes).
Also watch out for multiple byte hex numbers. i.e. you have 2 bytes 0x01 and 0x02, but a multiple byte hex number will append those two and convert to decimal. (0001 0010) == (18)
Finally some binary files will contain pointers (a.k.a memory locations) to other locations in the same file or other files. If this is the case you will need to be able to handle that pointer so that you do not lose any data.
I would suggest first, parsing the file as samizdat has suggested. Most likely you will find garbled fields after you are done. These garbled fields will probably fall into one of the categories I have mentioned. From there on out, you become Sherlock Holmes and try to determine what each one of them is.
Some tools to help you look at the binary data are hexdump or a combination of dd + hexdump (if you know the block-size of each record).
i.e. (
dd if=binaryfile bs=blocksize | hexdump -c | more)
If you are using a *nix variant that is. If you do not have a *nix OS, then install cygWin and use it for these command line tools.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.