Re: Confession of a Perl Hacker

As clemburg pointed out, the widespread use of text-based data-transmission standards (HTML, XML and the like) over the "old-school" binary formats means that the pack() statement is showing up in fewer and fewer programs and modules. That people can live a long and fulfilling life without even touching it is perhaps a sign of progress.

pack() allows you to build "packed binary" scalars. In other words, the pack() template specifies how you want things organized in memory, in a byte-by-byte manner. You aren't really sure how things are organized within Perl if you have created an array, but if you pack() this array, you will know exactly where things stand. Why would you care? It depends on your application, and if the input and output data comes in a precisely defined "binary" format, you will likely be using pack() and unpack() to interface.

For example, binary files like GIF, JPEG have headers that are stored in binary, not ASCII, and they need to be "decoded" to be understood by Perl. These file formats were created by C programs, and C programs work in a different way than Perl does.

Where Perl programmers work with scalars (i.e. strings), arrays and hashes, C programmers work with some basic variable types and struct definitions. A "struct" is really just zero or more variables crammed together, end-to-end, into a managable package that can be allocated, deleted, copied, passed from function to function, and what have you without worrying too much about the internals. Most C programs make use of "struct" like Perl programs make use of arrays and hashes, as convenient ways to store data.

Here's an example that illustrates the difference:

   Perl:

           my (%record) = ();

           $record{'id'}   = 419;
           $record{'time'} = time();
           $record{'name'} = "Quentin";

   C:

           struct record
           {
                int    id;
                time_t time;
                char   name[8];
           } a_record;

           a_record.id   = 419;
           a_record.time = time();
           strcpy (a_record.name, "Quentin");
[download]

In the Perl example, you could put anything into the hash %record without concern for type, or even the key that you are inserting it into. In C, though, you have to specify what "keys" you can use, and more specifically, what type of data each is prepared to accept. "id" can only be an "int", and "name" can only contain 8 characters (i.e. a "string"). C is pretty strict about that stuff, and if you step outside the lines, either the compiler freaks out, or your program crashes or behaves strangely.

Here's where pack() and unpack() come into play. Let's say you had to read data from a file that was created by a C program that used the "record" struct, and you want to modify some of this stuff and put it back right where it came from. Here's how you might go about doing that:

     my (%record) = ();
     my ($packed_record);
     my ($packed_record_size) = 4+4+8;

     # Open the file and read a single record out of it.
     open (FILE, "$data_file");
     read (FILE, $packed_record, $packed_record_size);
     close (FILE);

     # Unpack the record to decode it
     ($record{'id'},$record{'time'},$record{'name'})
         = unpack ("lla8", $packed_record);

     # Make a change
     $record{time} = time;

     $packed_record = pack ("lla8", $record{'id'},$record{'time'},$rec
+ord{'name'});

     open (FILE, ">$data_file");
     print $packed_record;
     close (FILE);
[download]

The first parameter of the pack() and unpack() calls is dictated by the format of the struct. In this case, the first two variables are of type "long int" (as 'time_t' is an alias, and 'int' is of type 'long' by default on most 32-bit compilers). The reason for using 'a' instead of 'A' is that C strings are "NULL padded" by default. In other words, the string "Quentin" is actually represented in memory as follows:

      'Q' 'u' 'e' 'n' 't' 'i' 'n' \x00
[download]

The last byte is used by the C library to figure out when the string is supposed to stop. Perl uses another method, so you don't have to fuss about ASCII 0 bytes in your strings, thankfully.

Basically, if you need to use pack() and unpack(), you will have to figure out the format of what you're reading, which is usually described in a C context, and more often than not, in the form of ".h" header files or RFCs which show you how the bytes are organized and should be decoded.

The documentation on pack() and unpack() is so terse likely because the utility and application of these functions is pretty clear to most 'C'-type programmers who used 'struct'. Certainly, though, you recognize that it must be improved to be intelligible to your average modern Perl programmer.

Comment on Re: Confession of a Perl Hacker Select or Download Code