in reply to Extracting formatted text block
Here, I turn a line such as
into a pattern (regex) that looks like" ---- "
The transformation is accomplished via a set of very simple regexes. Once I have that, I use it to parse the fields out of the lines of data. I also use it to parse the header line. I remember the columns that the 'NAME' header appeared in, and only take those columns of data. Since there can be multiple sets of columns, I have to do a little kludgery to keep the first 'NAME' column of data separate from the second, and so on./ (....) /
(Update: Added a __DATA__ section and changed some variable names.)
use strict; use warnings; my $gold_field = 'NAME'; my( @gold_aoa, @col, $pre,; $pat, $pad ); while (<DATA>) { chomp; if ( /^[- ]+$/ ) { # this line is a picture; turn it into a pattern $pad = ' ' x length($_); # overkill, I know. s/ -/ (-/g; s/^-/(-/g; s/- /-) /g; s/-$/-)/g; y/-/./; $pat = $_; # start looking for data. # prev contains header my @hdr = $prev =~ /^$pat/; s/\s+$// for @hdr; @col = grep { $hdr[$_] eq $gold_field } 0 .. $#hdr; next; } if ( $pat ) { $_ .= $pad; if ( my @dat = /^$pat/ ) { @dat = @dat[@col]; for my $i ( 0 .. $#dat ) { $dat[$i] =~ s/\s+$//; length($dat[$i]) and push @{ $gold_aoa[$i] }, $dat[$i]; } } else { # issue report local $, = ','; local $\ = "\n"; print map @$_, @gold_aoa; @gold_aoa = (); # stop looking for data undef $pat; } } $prev = $_; } __DATA__ Here's the original sample data: CODE NAME CODE NAME -------- ----------------------- -------- ----------------------- ABC NAME ONE RST NAME EIGHT ... DEF NAME TWO THREE WXY NAME NINE - TEN GHIJK NAME FOUR ... ZAB NAME ELEVEN LMN NAME FIVE - SIX CDE NAME TWELVE OPQ NAME SEVEN And here's another bunch of data. It all still works! CODE NAME AGE CODE NAME AGE CODE NAME AGE ---- ----- --- ---- --------- --- ---- ----------- --- ABC ONE 1 RST EIGHT 19 RS0 VEGA 39 DEF TWO 2 WXY NINE 23 WX0 SHELIAK 23 DEJ THREE 3 WXZ TEN 29 WY0 SULAFAT 29 GHI FOUR 9 ZAB ELEVEN 31 ZA0 AL ADFAR 31 LMN FIVE 10 CDE TWELVE 37 CD0 AL ATHFAR 37 LMS SIX 13 OPQ SEVEN 15
(Update: Added the following commentary.)
This solution was designed to be flexible (i.e. robust) in the face of variable numbers of column sets (i.e. your example showed two, but I wanted to allow for any), and variable column widths and gutter widths. Some of the other proposed solutions hard-code these parameters. I think solutions that key off of the 'INTERESTING CODE' line are particularly non-robust.
|
|---|