A more elegant use of unpack

maida has asked for the wisdom of the Perl Monks concerning the following question:

Hello all, I am parsing several large text files and I am using unpack to seperate some fixed length records. One of our fellow perl monks users suggested that I look for a more elegant use of unpack. So here I am and this is what I have.


__DATA__
AP040003EZ9891783    61125    N        BX    108.0 0000   03196       
+      00000   D    Y     B
BP041303DD554        009J0    N        BX      8.7 5000   03168       
+62    00000        Y     W


___PART OF THE CODE________

   elsif (/^\s+?\S{13}\s+?\S+?\s+?\S/){
      $_ =~ s/^\s*//;
      my @fields = unpack "a21 a9 a9 a2 a13 a8 a9 a9 a4 a5 a6", $_;

        print "      PIIN       \= $fields[0]\n";
        print "      FSCM       \= $fields[1]\n";
        print "      N/A        \= $fields[2]\n";
        print "      U/I        \= $fields[3]\n";
        print "      UNIT PRICE \= $fields[4]\n";
        print "      AWD DT     \= $fields[5]\n";
        print "      QTY        \= $fields[6]\n";
        print "      OPT DT     \= $fields[7]\n";
        print "      FOB        \= $fields[8]\n";
        print "      REP        \= $fields[9]\n";
        print "      TYPE       \= $fields[10]\n";
        print "\n";
   }
[download]

Thanks in advance. -Shawn

Comment on A more elegant use of unpack Download Code

Replies are listed 'Best First'.
Re: A more elegant use of unpack by wfsp (Abbot) on Sep 03, 2004 at 12:20 UTC
After looking at the docs again the 'a's should be 'A's. `a A string with arbitrary binary data, will be null padded. A A text (ASCII) string, will be space padded.` [download] See also: perlpacktut in the docs. Update: Added reference to tutorial.	[reply] [d/l]
Re: A more elegant use of unpack by clscott (Friar) on Sep 03, 2004 at 16:42 UTC
Just a personal preference but I would do: `my @field_names = qw\|PIIN FSCM N/A U/I UNIT PRICE AWD DT QTY OPT DT FO +B REP TYPE\|; my $pack_defn = 'A21 A9 A9 A2 A14 A8 A9 A9 A4 A5 A6'; my %fields; @fields{@field_names} = unpack($pack_defn,$_); foreach (@field_names){ print "\t$_\t\= " , $fields{$_},"\n"; }` [download] My changes keep the field names and the field unpack definitions closer together, puts the values into a hash with the appropriate named keys and removes repeated code for the printing (use formats if you want better alignment in the columns). It may be important to note that this is not as efficient as the way you are currently doing it. As wfsp noted your 'a's should be 'A's and you are one character off in the 5th field (counting from one). Additionally your regexp in the elsif line does not match any of your sample data lines. -- Clayton	[reply] [d/l]
Re^2: A more elegant use of unpack by mifflin (Curate) on Sep 03, 2004 at 18:32 UTC
deleted by mifflin	[reply]
Re: A more elegant use of unpack by Aristotle (Chancellor) on Sep 03, 2004 at 16:29 UTC
First, since you check the string with an initial match, there's no need to `s///` it separately to trim the whitespace: just capture the part you're interested in and use it directly. Also, all those lazy quantifiers should be greedy: that which follows your plus quantifiers can never be matched by that which is quantified (ie `\s` can never match `\S` and vice versa), so greedy vs lazy does not change the match semantics. And greedy is both more efficient and makes for less clutter. I'd add an `/x` for good measure. `elsif ( /^ \s+ ( \S{13} \s+ \S+ \s+ \S.* )/x ) { my @fields = unpack "a21 a9 a9 a2 a13 a8 a9 a9 a4 a5 a6", $1; # ... }` [download] What follows in your case has a lot of repetition: the print, the formatting whitespace, and the reference to `@fields` is duplicated over and over. You can do better than that: `elsif ( /^ \s+ ( \S{13} \s+ \S+ \s+ \S.* )/x ) { my @field = qw( PIIN FSCM N/A U/I UNIT PRICE AWD DT QTY OPT DT FOB REP TYPE ); my %value; @value{ @field } = unpack "a21 a9 a9 a2 a13 a8 a9 a9 a4 a5 a6", $1 +; printf " %-10s = %s\n", $_, $value{ $_ } for @field; print "\n"; }` [download] Makeshifts last the longest.	[reply] [d/l] [select]