Corion has asked for the wisdom of the Perl Monks concerning the following question:

I am currently thinking of porting some of my Delphi / Turbo Pascal utilities over to Perl. But I have met with a design problem which I want to post for discussion here :

Pascal has the notion of records, that is, a user defined compound type, more or less like the struc{} thing in C AFAIK. My utilities will parse many of such records to extract information from the files. My current idea to handle this stuff is to map each record to a hash which contains name->value pairs from the file, this can be done in a number of beautiful ways, currently I favor a generic "ReadRecord" routine that gets passed a string suitable for a call to unpack() and a list with the names for each unpacked value. Another, maybe less error prone approach would be to pass in a list of type->name pairs, which would make adding and moving record members easier.

Now where is the problem ? The problem comes with finding the binary size of such a record. Before I can parse a record into a hash, I have to read enough bytes from the disk. Turbo Pascal has the SizeOf() pseudo-function, but Perl dosen't seem to have this. I _could_ hack it up and write my own unpack()-string parser that looks at such a string and tells me how many bytes it most likely would want to unpack correctly, but I would like to know if there is a better way before hacking up such an ugly kludge ...

Replies are listed 'Best First'.
My current solution
by Anonymous Monk on Mar 16, 2000 at 14:26 UTC
    I have now decided to implement a (maybe temporary) solution, and it isn't that hard or kludgy as I thought :
    sub RecSize { # Returns the length of the data required by a string for unpack() # Variable length (with "*") will result in an error # Platform specific stuff like "i" and "I" also result in an error my( $Data ) = @_; my( $Result, $Len, $Repeat ) = 0; my %Size = ( "a"=>1, # A string with arbitrary binary data, will be n +ull padded. "A"=>1, # An ascii string, will be space padded. "b"=>1, # A bit string (ascending bit order, like vec()) +. "B"=>1, # A bit string (descending bit order). "h"=>1, # A hex string (low nybble first). "H"=>1, # A hex string (high nybble first). "c"=>1, # A signed char value. "C"=>1, # An unsigned char value. "s"=>2, # A signed short value. "S"=>2, # An unsigned short value. "n"=>2, # A short in "network" (big-endian) order. "N"=>4, # A long in "network" (big-endian) order. "l"=>4, # A signed long value. "L"=>4, # An unsigned long value. "v"=>2, # A short in "VAX" (little-endian) order. "V"=>4, # A long in "VAX" (little-endian) order. "q"=>8, # A signed quad (64-bit) value. "Q"=>8, # An unsigned quad value. "x"=>1, # A null byte. "Z"=>1, # A zero terminated string (will need byte count + !) ); while ( $Data =~ s/([aAbBhHcCsSlLnNvVqQxZ])(\d*)// ) { $Repeat = $2 || 1; $Len = $Size{$1}; if ($Len) { $Result += $Len * $Repeat; } else { $Result = undef; last; }; }; return $Result; }; sub decodeRecord { my ($Value, $Records, @Names) = @_; my (@Values) = unpack( $Records, $Value ); my (%Result); foreach my $Name (@Names) { $Result{$Name} = $Values[0] if ($Name); shift @Values; }; return \%Result; };
    This code is then used as follows :
    $strMacBinaryHeader = "CZ64A4A4AAvvvaaNNVVvaA4A8VVaaN"; @nameMacBinaryHeader = ( "ID", "filename", "filetype", "filecreator", "fileflags", undef, "yoffs", "xoffs", "fileid", "fileflags2", undef, "dataforklength", "resourceforklength", "creationdate", "lastmodified", "infolength", "fileflags3", "macbinary3id", undef, "totalunpackedlength", "secondaryheaderlength", "macbinary3requiredversion", "CRC" );sub decodeMacBinaryHeader { my ($Buffer) = @_; my $Result = &decodeRecord($Buffer, $strMacBinaryHeader, @nameMacBin +aryHeader ); roundUp( \$Result->{dataforklength}, 128 ); return $Result; }; $HeaderSize = &RecSize($strMacBinaryHeader); $Filename = shift || "../macicon/test/icon"; open FILE, "< $Filename" or die "Error opening \"$Filename\" : $!\n"; binmode FILE; read FILE, $MacBinHeader, $HeaderSize; $Header = decodeMacBinaryHeader($MacBinHeader); $ResourceForkStart = $HeaderSize + $Header->{secondaryheaderlength} + $Header->{dataforklength}; &dump( $Header );
    where &dump() is a simple routine that dumps a hash...
RE: How to organize file records
by Dunx (Initiate) on Mar 14, 2000 at 19:56 UTC
    Are you committed to using binary data?
    Something I've found works quite well is to have one record per line, and to separate fields within the record with a separator of some kind (I usually use '#').
    Then it's easy to use <> to pull in records one at at time, and split to extract the fields.
    The obvious caveats are that this won't work properly with non-text data, that individual fields can't span multiple lines, and that no field may contain the separator (but escaping deals with those last two points). It works well for web data though.
      Well, the problem is that the files are not really database files but files with an already given definition (for example, .ZIP archives, .MP3 files or in this special case, MacBinary files). I have good, working Pascal code that reads the information from these files, but I'd like to port that work to Perl to become more portable.

      Perl would have the easy extensibility of a scripting language, but it has the problem of reading and parsing the record in the first place ...
Re: How to organize file records
by jerji (Novice) on Mar 15, 2000 at 02:45 UTC
    If the problem here is that you need to know how to read a specific number of bytes for some pre-defined record size, you can use Perl's built-in read() function:
    my $REC_SIZE = 16; # 16 byte records. my $buf; # location to store records my @records; # array of records. while (read FH, $buf, $REC_SIZE) { push @records, $buf; }
    You may also need to use binmode() depending on your operating system. And btw, you can use Perl's length() function to determine the size of scalars.
      Thanks for the information on how to actually read files - it seems like I didn't make my problem clear enough :) I already know all the basics (and some more) about file handling in Perl. But what I do not know is, how one in "general" handles reading stuff from files that are structured as records or even less structured (i.e. records of different length). Your proposal to hardcode the record size strikes me as a bad idea, since I have to recalculate the record size each time I find some new information about the file.

      As I watch this discussion unfold, it comes to me that maybe my first kludgy idea of parsing the string passed to unpack() wasn't that bad after all ...