Bin\Hex Parsing in Perl ?

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Bin\Hex Parsing in Perl ? by tachyon (Chancellor) on Oct 25, 2001 at 14:35 UTC
OK first thing you need to understand is that all files are binary files at heart as this is all that computers really deal with and store on disk. We diferentiate 'text' files from 'binary' files but this is a somewhat artificial distinction. Both files are a string of 0s and 1s. With a text file we arbitrarily assign meaning to each 7-8 bits via the ASCII/Extended ASCII character encoding. For example binary 00100000 (decimal 32, Hex 20) is said to represent the space character in ASCII. Hexadecimal notation is a convenient shorthand when dealing with binary data. A sequence of 4 bits of binary data can represent the decimal numbers 0-15 or the hexadecimal numbers 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F. Thus 11110000 in binary can be represented as F0 in Hex (often written 0xF0x0 where the 0x says this is a hex number) In a 'text' file each 'line' will end with a line ending sequence. This is system specific but is LF (Line Feed \n 0xA) on unix CR (Carriage Return \r 0xD) on mac and CRLF ( \r\n 0xD0xA) on Windows. When you read in a file in Perl using the angle input operator <FILE> Perl reads in all the binary data until it finds a line ending (as defined in the $/ special var). Perl also automagically converts line endings from the system specific one to \n for internal use when reading and back to the system specific one when writing. You can stop this conversion ocuuring using binmode(). When you read() a file you just get an arbitrary number of bytes Regardless of how you get your data in you still have sequences of binary bits to which we assign meaning. The major difference in a binary file is that 0xA means the literal binary 00001010 rather than being the 0xA or \n line ending. You can print a binary file. What you see is the ASCII characters that correspond to each 7-8 bits of binary data - generally this looks like gibberish, however raw text stored in the binary file as its ASCII encoding will look like normal text. Have a look at a M$.doc file with a text editor. The gibberish is the binary data denoting all the formatting details. Somewhere within the file you will find the actual raw text in human readable form. There are other bits of raw text in there too. In a binary file we can define any structure we like. Say we want to store file permissions efficiently. 755 can be represented as 111 101 101 in binary. To store this as text we would use 3 bytes or 38=24 bits but as raw binary we only need 9 bits or 1/3 the space. This is part of the appeal of binary formats. You may also start to get an idea about how compression works. Here is a bit of code that will read in a whole file, and convert it to Hex - we are thus seeing sequences of 4 binary bits. `my $file = 'c:/test.txt'; # undef $/ to read whole file in in one go undef $/; open FILE, $file or die "Can't open $file $!\n"; # binmode FILE to supress conversion of line endings binmode FILE; my $data = <FILE>; close FILE; # convert data to hex form my $hex = unpack 'H', $data; print $hex; __END__ 4a75737420416e6f74686572205065726c204861636b65720d0a` [download] The data after the __END__ is a sample output. Put it in a file and change unpack() to pack() in the code above to read the original data in text from. Note the 0d0a at the end which represents a Windows line ending. If you wanted to see the raw binary change 'H' to 'B' At the end of the binary you will then see 0000110100001010 or with a little formatting 0000 1101 0000 1010 which is a CRLF sequence in raw binary format - it is much easier to read in hex! For an interesting discussion on pack() and unpack() and more on binary stuff see Confession of a Perl Hacker - it's a great node. cheers tachyon s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print	[reply] [d/l]
Re: Bin\Hex Parsing in Perl ? by jeroenes (Priest) on Oct 25, 2001 at 09:46 UTC
Other monks learned me how to doo this some while ago. A bit confused though whether you are trying to read/write binary or a text file populated with hexidecimal characters. Your method seems fine to me, but writing in general can be done just using the print command, whether or not your strings are binary. On win32 you do need the binmode command, though. For reading you could also use this to fetch fixed-length records: `my $rec; do_something( $rec) while read(FILE, $rec,$reclength);` [download] With unpack you can translate both binary and hexidecimal data: `@hd = unpack 'h', $rec; @floats = unpack 'f', $rec;` [download] So the exact name of what you are trying to do? Reading binary files, unpacking .... I will have a look into that Data:Hexdump module. It's probably worth it. Mehopes it's all a bit clearer to you now. Jeroen "We are not alone"(FZ) Update: I've read the HexDump docs, and what it does is just to make a binary file ready for human reading like in a hex viewer. So it's very usable to dump a bin file on a tty, but not fit to parse a binary file for subsequent use in a script. See the code above.	[reply] [d/l] [select]
Re: Bin\Hex Parsing in Perl ? by astaines (Curate) on Oct 25, 2001 at 02:42 UTC
pack is your friend, well unpack actually... Also try looking on CPAN to see if anyone has done whatever you want to do already. -- Anthony Staines	[reply]
Re: Bin\Hex Parsing in Perl ? by toma (Vicar) on Oct 26, 2001 at 18:44 UTC
Here is a hex dump program similar to the one posted by tachyon. This one is for relatively small files where you want to examine each byte carefully. I used it for debugging code that generates .wav files. `#!/usr/bin/perl use strict; use warnings; use diagnostics; =head1 NAME dump.pl =head1 SYNOPSYS perl dump.pl chimes.wav > chimes.raw =head1 DESCRIPTION This routine reads a file and prints out a ascii dump. =cut $/ = ""; $_ = <>; my $i=0; for (unpack("C", $_)) { print $i++."\t".$_."\t".chr($_)."\n"; }` [download] It should work perfectly the first time! - toma*	[reply] [d/l]