Yuck! I thought (hoped) that this type of file format -- mixed, fixed-format records -- had died long ago; but they seem to keep reinventing it :)
For your first example, the trick is to define a regex that will match the fields in the header line:
my $reHeader = '(\b\w+\s*)?' x 10; ## Adjust the repeat value to cover + the maximum no of fields
and use that to construct an unpack template to parse the following values line.
This is not 'nice code', but it demostrates the technique:
#! perl -slw use strict; use Data::Dump qw[ pp ]; my $reHeader = '(\b\w+\s*)?' x 10; my %data; until( eof( DATA ) ) { ## Read the header line and remove the newline chomp( my $header = <DATA> ); ## parse the fields using the regex, ignoring undefined fields my @keys = grep defined, $header =~ $reHeader; ## trim the trailing whitespace from the keys s[\s*$][] for @keys; ## Use the capture position arrays (@- & @+) ## to work out the field widths and construct a template my $tmpl = join ' ', map{ defined( $-[$_] ) ? do{ my $n = $+[$_] - $-[$_]; "a$n" } : () } 1 .. $#+; ## read and chomp the values line chomp( my $vals = <DATA> ); ## Extract the value fields using the template my @vals = unpack $tmpl, $vals; ## trim leading & trailing whitespace s[^\s*][],s[\s*$][] for @vals; ## Add the key/value pairs to the hash @data{ @keys } = @vals; ## discard the blank line between the grouped pairs of lines. <DATA>; } pp \%data; ## display the hash constructed __DATA__ TRHYST TROFFSETP TROFFSETN AWOFFSET BQOFFSET 2 0 5 3 HIHYST LOHYST OFFSETP OFFSETN BQOFFSETAFR 5 3 0 3 CELLR DIR CAND CS LUC083A MUTUAL BOTH NO
Outputs:
C:\test>junk79 { AWOFFSET => 5, BQOFFSET => 3, BQOFFSETAFR => 3, CAND => "BOTH", CELLR => "LUC083A", CS => "NO", DIR => "MUTUAL", HIHYST => 5, LOHYST => 3, OFFSETN => "", OFFSETP => 0, TRHYST => 2, TROFFSETN => "", TROFFSETP => 0, }
Extending that to apply it to all your other sections will require a little ingenuity and a lot of painstaking testing.
I do hope for your sake that the number and ordering of the different sections is well-defined, else you've got an even worse task on your hands.
Note:This assumes that field names do not contain spaces. If they do, you are in shit street.
In reply to Re^3: Reading tab/whitespace delimited text file
by BrowserUk
in thread Reading tab/whitespace delimited text file
by reaper9187
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |