If the lines in your file are all the same length (even the blank ones) and each field is also fixed length then you can use the fact that unpack formats ignore whitespace in conjuction with the @nnn absolute position specifier to lay out the format in a manner that is relatively easy to write and later read.

By fixing the start of each line using @nnn and then using relative lengths and xxx to position stuff within each line greatly simplifies working out your format. In the example below, I've left the original 'specification' lines interleaved for readability and then used a regex to strip them out before using the format. There are several ways of doing this, and the much-maligned HEREDOC might actually be a better way. You could also omit them if you preferred.

Using qw// in conjunction with a hash slice to name the individual fields is a nice way of laying out the field names in a clear and concise manner. You can the assign a ref to the hash into another hash indexed by personnel id (or lastname etc.), giving you a HoH which is probably the best structure for this kind of work.

my $unpack_format = ' @0 A8 xxA9 xxxxxxxxA8 xxxxxxxxxxxa5 LastName, FirstName Home <hm phone> field <value> @53 A19 xxxxxxxxA8 xxxxxxxxxxxa5 Address1 Work <wk phone> field <value> @105A38 xxxxxxxxA5 Address2 field <value> @157A4 xxA5 xxA8 xxxxxxxxA5 xxxxxxxxxA5 City, State, zip field <value> field <value> @261xxxxxxxA5 xxxxxxxxxxxxxxA5 xxxxxxxxxxxxxxxxA5 field <value> field <value> field <value> @313xxxxxxxA5 xxxxxxxxxxxxxxA5 xxxxxxxxxxxxxxxxA5 field <value> field <value> field <value> @365xxxxxxxA5 xxxxxxxxxxxxxxA5 xxxxxxxxxxxxxxxxA5 field <value> field <value> field <value> '; ## remove the "documentation" lines. $unpack_format = s[^\s.*?\n][]mg; my %personnel; # Assuming that the Personnel number is an integral part of the filena +me for my $filename ( <PID*.rec> ) { my %record; @record{ qw[ lastname firstname homephone field1 address1 workphone field2 address2 field3 city state zip field4 field5 field6 field7 field8 field9 field10 field10 field11 field12 field13 field14 ] } = unpack $unpack_format, slurp_file( $filename ); # Slurp file could be a function or do{ local (*ARGV, $/) = $filename, + <> }; # Search for [Juerd]s "Cheap idioms" node for details $personnel{ $filename =~ m[PID(\d+)\.rec] } = \%record; }

However, I strongly suspect that lines like

City, State, zip     field <value> field <value>

nn your specification don't indicate that City is a 4-byte field or State 5-bytes, but that within the overall 21-byte field allocated to them, the parts are variable length seperated by commas.

unpack has no mechanism for dealing with this variable-width-within-a-fixed-length-field type of data. You would need to unpack the 3 as a single field and then use split or a regex to subdivide it later, which is a pain.

In this case, I would probably go for a big-regex, though that doesn't mean it has to be complicated or hard to write. And using the /x modifier and embedded comments, it can become self documenting. Using your record specification as a starting point, and then interleaving the elements of the regex lined up as best you can with those lines makes for a reasonably readable layout. The need to add \s(*|+) between the elements when using /x has the effect of making the regex look 'noisy', but it's a trade off against the self-documentation provided by the embedded comments and alignment. Overall I find this quite readable, but YMMV.

my $re_record = qr[ #LastName, FirstName Home <hm phone> field <value> ([^,]+) ,([^ ]+) \sHome <([^>]+) >\s+field\s<([^>]+)> \s*\n #Address1 Work <wk phone> field\s<value> (.*) Work <([^>]+) >\s+field\s<([^>]+)> \s*\n #Address2 field <value> (.*) field\s<([^>]+)> \s*\n #City, State, zip field <value> field <value> ([^,]+),([^,]+),(.*) field\s<([^>]+)>\sfield\s<([^>]+)> \s*\n \s* \n #field <value> field <value> field <value> field\s<([^>]+)> \s+field\s<([^>]+)> \s+ field\s<([^>]+)> \s*\n #field <value> field <value> field <value> field\s<([^>]+)> \s+field\s<([^>]+)> \s+ field\s<([^>]+)> \s*\n #field <value> field <value> field <value> field\s<([^>]+)> \s+field\s<([^>]+)> \s+ field\s<([^>]+)> \s*\n ]x; my $record = slurp_file( $filename ); my %record; @record{ qw[ .... ] } = ( $record =~ $re_record ); $personnel{ .... } = \%record;

You could use a repeat count for the last 3-lines/9-fields, but if the data has a fixed number of fields, I think that using cut&paste makes things clearer in this instance.

One tip: If you decide to go the 'big regex' route, start by commenting out everything but the first line, capture to an array and print the results. Once you have that capturing the right things, uncomment the second and repeat. Simple advice, but it took me a while to work it out.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller



In reply to Re: Re: Re: opposite of format? by BrowserUk
in thread opposite of format? by xChauncey

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.