SkipHuffman has asked for the wisdom of the Perl Monks concerning the following question:

I think that I am trying to do something in a relatively portable way.

I have a data set where I have a single file containing different types of records. Each record type has a strictly defined format of fixed length fields. The first four fields are common to all record types, withthe third field defining the type of record. (first is a date, second is a sequence within that date, fourth is a sequence within the record type. none of which is relevent, which is why this comment is parenthetical) I need to be able to both read and write these records.

I have this code

my ($RequestYear,$RequestMonth,$RequestDay,$RequestSequence,$Reco +rdType,$RecordSequence,$Record) =unpack("A2 A2 A2 A4 A2 + A2 A1486",$_) ;

To read the header fields. then I read the $RecordType and use a series of elsif to determine how to read the rest of the record, like this(short example, most have more fields):

elsif ($RecordType==94){ my ($Destination, $DestinationType) =unpack("A8 A40", $Record);

I would like to split the packing/unpacking function from the data definition so that I can put it all in one place, and reuse it. Something like this:

@HeaderRecordNames = qw/$RequestYear $RequestMonth $RequestDay $Requ +estSequence $RecordType $RecordSequence $Record/; @HeaderRecordLengths = qw/A2 A2 A2 A4 + A2 A2 A1486/; ... @RecordNames[94] = qw/$Destination $DestinationType/; @RecordLengths[94] = qw/$A8 A4/; ... my (@HeaderRecordNames) = unpack(@HeaderRecordLengths, $_); my (@RecordNames[$Recordtype])=unpack(@RecordLengths[$RecordType]);

This would eliminate my elsif array and also allow me to keep the data definitions in one place so that I can use them for packing or other purposes. Or change them as needed.

Of course this does not work because I am too novicelike to manage the dreaded symbolic references.

Thoughts?

Skip Huffman

Replies are listed 'Best First'.
Re: Dreaded Symbolic References
by Limbic~Region (Chancellor) on Mar 08, 2004 at 16:35 UTC
    SkipHuffman,
    Try something like the following:
    #!/usr/bin/perl use strict; use warnings; my $foo; # Replace with $_ accordingly my %data; my %table = ( 94 => "A8A40" ); my @fields = qw(Year Month Day Request_Seq Request_Type Record_Seq Rec +ord_Data); @data{ @fields } = unpack "A2A2A2A4A2A2A1486" , $foo; my ($Destination, $DestinationType) = unpack $table{$data{Record_Type} +} , $data{Record_Data};
    Cheers - L~R

      Beauty!

      That is exactly the direction that I needed. My code is now like this:

      my %Field; my @HeaderFields=qw(RequestYear RequestMonth RequestDay RequestSequenc +e RecordType RecordSequence Record); my %RecordMap =(Preface => A2A2A2A4A2A2A1486, 94 => A8A40); ... @Field{@HeaderFields}=unpack("$RecordMap{Preface}",$_); ...

      Now just to change the rest of the mess into the nice clean hashes. (but first, lunch!)

      Thanks LR (and all the other monks)

      Skip

        In order to avoid potential mismatches of the field name and pack template values, I would think about using a hash to generate both the unpack template and the field names. The only problem is hashes are not stored in any particular order. To combat this, one could store the "hash" in an array, and separate the keys and values "manually." Example:

        my @fields = qw(Year A2 Month A2 Day A2 Request_Seq A4 Request_Type A2 Record_Seq A2 Record_Data A1486); my @keys = map $fields[$_], grep $_%2==0, 0 .. $#fields; my @values = map $fields[$_], grep $_%2==1, 0 .. $#fields; my %data; # bad name. we should know better. ;) @data{ @keys } = unpack join("", @values), $foo;

        Alternatives to this include, but are not limited to:

        • Use a tied hash that maintains order, such as Tie::IxHash.
        • Use a normal hash, but also keep along an array that is used to sort the hash keys and values. This is essentially what a tied hash would do for you, automatically.
        • Some other extremely clever solution I have not thought of.
Re: Dreaded Symbolic References
by kvale (Monsignor) on Mar 08, 2004 at 16:31 UTC
    The way I would abstract variant record types is with a hash that you could reuse:
    my %unpack = ( preface => "A2 A2 A2 A4 A2 A2 A1486", 94 => "A8 A40", ... ) @HeaderRecord = unpack $unpack{preface}, $FullRecord; @RecordNames[$RecordType] = unpack $unpack{$RecordType}, $Record;

    -Mark

      Ah, but I really want the names. Keeping track of exactly what field 35 for record type 22 is what I want to avoid. I think tracking that I am working on a NameAddress record and reading HomeZipCode is going to be a bit easier to maintain.

      Thanks,

      Skip

Re: Dreaded Symbolic References
by dada (Chaplain) on Mar 08, 2004 at 17:24 UTC
    @RecordNames[94] = qw/$Destination $DestinationType/; @RecordLengths[94] = qw/$A8 A4/;
    this is wrong: you can't assign a list (as given by qw/.../) to an array element. what you're writing is:
    $RecordNames[94] = "$Destination"; $RecordLengths[94] = "$A8";
    and this will not work, of course. furthermore, you should abandon the habit of populating a single variable for each field you have (eg. $Destination, $DestinationType). use a hash instead:
    $fields[94] = [ qw/Destination DestinationType/ ]; $lengths[94] = qq/A8 A4 / ; my @data{ @{$fields[94]} } = unpack($lengths[94], $Record); # then you have: # # $data{Destination} # $data{DestinationType}
    hope this helps...

    cheers,
    Aldo

    King of Laziness, Wizard of Impatience, Lord of Hubris

Re: Dreaded Symbolic References
by matija (Priest) on Mar 08, 2004 at 16:40 UTC
    I think it would be easier for you to understand and debug what you are doing if, instead of symbolic references, you used eval, something like this:
    my $assign; $assign="(".join(",",@HeaderRecordNames).")="; $assign.="unpack(\".join("",@RecordLengths[$RecordType])."\");"; print "Evaluating $assign\n"; # this is for debugging eval $assign;
    While I assume (I didn't benchmark) that symbolic references would be somewhat quicker, I think this code is clearer and much easier to debug.
Re: Dreaded Symbolic References
by chromatic (Archbishop) on Mar 08, 2004 at 16:46 UTC
    I would like to split the packing/unpacking function from the data definition so that I can put it all in one place, and reuse it.

    Use a subroutine.

Re: Dreaded Symbolic References
by waswas-fng (Curate) on Mar 08, 2004 at 16:32 UTC
    You can use a hash like a dispatch, and have each format defined as a value for the key that names the type of record. Let me know if you need code example or if this make sense...


    -Waswas
Re: Dreaded Symbolic References
by fizbin (Chaplain) on Mar 08, 2004 at 21:58 UTC
    Ain't financial data vendor file formats fun?

    For what it's worth, here's what I usually do when confronted with stuff like this: (well, sort of - I actually have a few modules now that make this kind of thing easier)

    my %recordformat = ( '__COMMON__' => [qw( RequestYear 1-2 RequestMonth 3-4 RequestDay 5-6 RequestSequence 7-10 RecordType 11-12 RecordSequence 13-14 )], 94 => [qw( Destination 15-22 DestinationType 23-26 )] # other stuff ); sub fillHashFromData { my ($destination, $recordSpec, $data) = @_; my @spec = @$recordSpec; # make a copy while (@spec) { my $key = shift @spec; my $location = shift @spec; $location =~ /(\d+)(?:-(\d))?/ or die "Syntax error in recordSpec: @$recordSpec"; my ($start, $end) = ($1, $2 || $1); $destination->{$key} = substr($data, $start - 1, $end - $start + 1 +); } } # Much later.... my $dataLine; while ($dataLine = <VENDORFILE>) { my %datavalues; fillHashFromData(\%datavalues, $recordformat{__COMMON__}, $dataLine) +; fillHashFromData(\%datavalues, $recordformat{$datavalues{'RecordType +'}}, $dataLine); # Do stuff here }
    Now, if you really, really wanted things as plain variables instead of hash entries, that's easily done in the Do stuff here section:
    { no strict qw(vars refs); while(my ($k,$v) = each %datavalues) {$$k = $v;} }
    However, I'd stay away from that, since keeping things in a hash has the advantage that if you want it's very easy to enumerate the fields.

    One distinct advantage of using a %recordformat hash in the format above is that a section of it can be cut-and-pasted directly from the vendor's documentation; also, it becomes easy for you to remove a huge number of fields that you're ignoring from consideration without making certain that you have the right number of "x" specifications in the pack string. (Occasionally vendors will give field lengths; in my experience, they always give character position ranges)

    The subroutine that lets you construct a data line from a hash and a format is also pretty easy:

    sub fillDataFromHash { my ($src, $recordSpec, $data) = @_; $data ||= ' ' x $RECORD_LENGTH; # Maybe add to %recordformat my @spec = @$recordSpec; while (@spec) { my $key = shift @spec; my $location = shift @spec; $location =~ /(\d+)(?:-(\d))?/ or die "Syntax error in recordSpec: @$recordSpec"; my ($start, $end) = ($1, $2 || $1); substr($data, $start - 1, $end - $start + 1) = $src->{$key}; } } # ... # fill %datavalues somehow my $dataLine = fillDataFromHash(\%datavalues, $recordformat{__COMMON__}); $dataLine = fillDataFromHash(\%datavalues, $recordformat{$datavalues{'RecordType'}}, + $dataLine);
Re: Dreaded Symbolic References
by UnderMine (Friar) on Mar 08, 2004 at 17:27 UTC
    @HeaderRecordNames = qw/$RequestYear $RequestMonth $RequestDay $Requ +estSequence $RecordType $RecordSequence $Record/; @HeaderRecordLengths = qw/A2 A2 A2 A4 + A2 A2 A1486/; ... my (@HeaderRecordNames) = unpack(@HeaderRecordLengths, $_); my (@RecordNames[$Recordtype])=unpack(@RecordLengths[$RecordType]);
    Try using an intermediate stage for the mapping of values.
    @HeaderRecordNames = qw/RequestYear RequestMonth RequestDay RequestS +equence RecordType RecordSequence Record/; $HeaderRecordLengths = q/A2A2A2A4A2A2A1486/; ... my @head = unpack($HeaderRecordLengths, $_); my %Header = map {$HeaderRecordNames[$_] => $head[$_]} (0..$#head); my @data = unpack($RecordLengths[$head{RecordType}], $Header{Record}); my %Record = map {$RecordNames[$_] => $data[$_]} (0..$#data);
    This should work and create a hash containing key => value pairs you are after for both the header and record.
    Note: Untested
    UnderMine