Parse fixed-length ascii table

I got bored and gave this as an answer to a question today, but I figured it might have broader uses. (Not to me, of course, since I don't deal with data like this, but maybe for some of you.) It will figure out column headings and extract fixed-width ASCII fields based on them. It assumes each heading is a single word (uses the spaces between headings to determine field length), but can be adapted with a regex in the code comments to work around that.

use Data::Dumper;

my (@keys, $format, @data, @sizes);

$_ = <DATA>;

# assumes spaces between fields.. a more robust regex,
# allowing multiple words in fields but requiring at
# least 2 spaces between fields could be:
#      /\G(\S+(?:\s\S+)*)\s+/g
while (/\G(\S+)\s+/g) {
    push(@keys, $1);
    push(@sizes, $+[0]-$-[0]);
}

$format = join("", map { "A$_" } @sizes);

while (<DATA>) {

    # if you want a hashref for each line:
    my $i;
    push(@data, { map { ($keys[$i++], $_) } unpack($format, $_) } );

    # else, take out references to @keys above and just use this:
    # push(@data, [ unpack($format, $_) ] );
}

print Dumper(\@data);

__DATA__
Name            ID      PS      Gender     Age      Month    Code     
+ Cap        Pool
LName, FName    99999   99      M          99.9     12/2000  Add      
+ 99.99      99.99
[download]

Comment on Parse fixed-length ascii table Download Code

Replies are listed 'Best First'.
Re (tilly) 1: Parse fixed-length ascii table by tilly (Archbishop) on Jan 05, 2001 at 15:24 UTC
It can be unreliable to depend on being able to parse the names of the columns in a fixed-width ASCII table. But often there are other ways to find that answer, for instance from finding the locations of spaces between dashes right below the titles. For an example of how to do that see Locate char in a string.	[reply]