OK first thing you need to understand is that all files are binary files at heart as this is all that computers really deal with and store on disk.
We diferentiate 'text' files from 'binary' files but this is a somewhat artificial distinction. Both files are a string of 0s and 1s. With a text file we arbitrarily assign meaning to each 7-8 bits via the ASCII/Extended ASCII character encoding. For example binary 00100000 (decimal 32, Hex 20) is said to represent the space character in ASCII.
Hexadecimal notation is a convenient shorthand when dealing with binary data. A sequence of 4 bits of binary data can represent the decimal numbers 0-15 or the hexadecimal numbers 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F. Thus 11110000 in binary can be represented as F0 in Hex (often written 0xF0x0 where the 0x says this is a hex number)
In a 'text' file each 'line' will end with a line ending sequence. This is system specific but is LF (Line Feed \n 0xA) on unix CR (Carriage Return \r 0xD) on mac and CRLF ( \r\n 0xD0xA) on Windows. When you read in a file in Perl using the angle input operator <FILE> Perl reads in all the binary data until it finds a line ending (as defined in the $/ special var). Perl also automagically converts line endings from the system specific one to \n for internal use when reading and back to the system specific one when writing. You can stop this conversion ocuuring using binmode(). When you read() a file you just get an arbitrary number of bytes
Regardless of how you get your data in you still have sequences of binary bits to which we assign meaning. The major difference in a binary file is that 0xA means the literal binary 00001010 rather than being the 0xA or \n line ending. You can print a binary file. What you see is the ASCII characters that correspond to each 7-8 bits of binary data - generally this looks like gibberish, however raw text stored in the binary file as its ASCII encoding will look like normal text. Have a look at a M$.doc file with a text editor. The gibberish is the binary data denoting all the formatting details. Somewhere within the file you will find the actual raw text in human readable form. There are other bits of raw text in there too.
In a binary file we can define any structure we like. Say we want to store file permissions efficiently. 755 can be represented as 111 101 101 in binary. To store this as text we would use 3 bytes or 3*8=24 bits but as raw binary we only need 9 bits or 1/3 the space. This is part of the appeal of binary formats. You may also start to get an idea about how compression works.
Here is a bit of code that will read in a whole file, and convert it to Hex - we are thus seeing sequences of 4 binary bits.
my $file = 'c:/test.txt';
# undef $/ to read whole file in in one go
undef $/;
open FILE, $file or die "Can't open $file $!\n";
# binmode FILE to supress conversion of line endings
binmode FILE;
my $data = <FILE>;
close FILE;
# convert data to hex form
my $hex = unpack 'H*', $data;
print $hex;
__END__
4a75737420416e6f74686572205065726c204861636b65720d0a
The data after the __END__ is a sample output. Put it in a file and change unpack() to pack() in the code above to read the original data in text from. Note the 0d0a at the end which represents a Windows line ending. If you wanted to see the raw binary change 'H*' to 'B*' At the end of the binary you will then see 0000110100001010 or with a little formatting 0000 1101 0000 1010 which is a CRLF sequence in raw binary format - it is much easier to read in hex!
For an interesting discussion on pack() and unpack() and more on binary stuff see Confession of a Perl Hacker - it's a great node.
cheers
tachyon
s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print
|