in reply to Re^2: Reading tab/whitespace delimited text file
in thread Reading tab/whitespace delimited text file
Yuck! I thought (hoped) that this type of file format -- mixed, fixed-format records -- had died long ago; but they seem to keep reinventing it :)
For your first example, the trick is to define a regex that will match the fields in the header line:
my $reHeader = '(\b\w+\s*)?' x 10; ## Adjust the repeat value to cover + the maximum no of fields
and use that to construct an unpack template to parse the following values line.
This is not 'nice code', but it demostrates the technique:
#! perl -slw use strict; use Data::Dump qw[ pp ]; my $reHeader = '(\b\w+\s*)?' x 10; my %data; until( eof( DATA ) ) { ## Read the header line and remove the newline chomp( my $header = <DATA> ); ## parse the fields using the regex, ignoring undefined fields my @keys = grep defined, $header =~ $reHeader; ## trim the trailing whitespace from the keys s[\s*$][] for @keys; ## Use the capture position arrays (@- & @+) ## to work out the field widths and construct a template my $tmpl = join ' ', map{ defined( $-[$_] ) ? do{ my $n = $+[$_] - $-[$_]; "a$n" } : () } 1 .. $#+; ## read and chomp the values line chomp( my $vals = <DATA> ); ## Extract the value fields using the template my @vals = unpack $tmpl, $vals; ## trim leading & trailing whitespace s[^\s*][],s[\s*$][] for @vals; ## Add the key/value pairs to the hash @data{ @keys } = @vals; ## discard the blank line between the grouped pairs of lines. <DATA>; } pp \%data; ## display the hash constructed __DATA__ TRHYST TROFFSETP TROFFSETN AWOFFSET BQOFFSET 2 0 5 3 HIHYST LOHYST OFFSETP OFFSETN BQOFFSETAFR 5 3 0 3 CELLR DIR CAND CS LUC083A MUTUAL BOTH NO
Outputs:
C:\test>junk79 { AWOFFSET => 5, BQOFFSET => 3, BQOFFSETAFR => 3, CAND => "BOTH", CELLR => "LUC083A", CS => "NO", DIR => "MUTUAL", HIHYST => 5, LOHYST => 3, OFFSETN => "", OFFSETP => 0, TRHYST => 2, TROFFSETN => "", TROFFSETP => 0, }
Extending that to apply it to all your other sections will require a little ingenuity and a lot of painstaking testing.
I do hope for your sake that the number and ordering of the different sections is well-defined, else you've got an even worse task on your hands.
Note:This assumes that field names do not contain spaces. If they do, you are in shit street.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: Reading tab/whitespace delimited text file
by reaper9187 (Scribe) on Oct 22, 2012 at 06:55 UTC | |
|
Re^4: Reading tab/whitespace delimited text file
by reaper9187 (Scribe) on Nov 01, 2012 at 12:38 UTC | |
by BrowserUk (Patriarch) on Nov 01, 2012 at 13:08 UTC | |
by reaper9187 (Scribe) on Nov 02, 2012 at 10:57 UTC | |
by BrowserUk (Patriarch) on Nov 02, 2012 at 11:04 UTC | |
by reaper9187 (Scribe) on Nov 04, 2012 at 19:18 UTC | |
|