Parsing a print file

jakeeboy has asked for the wisdom of the Perl Monks concerning the following question:

Thanks to everyone who gave me ideas on Input Formats, unfortunately this report isn't a datafile so I couldn't figure out how the Text Modules since they look for quoted strings and there are none in the print file. I still have a problem though. I can't seem to parse out the data from this print file it's setup like:

<CODE> Name ID PS Gender Age Month Code Cap Pool
LName, FName 99999 99 M 99.9 12/2000 Add 99.99 99.99
11/2000 New 99.99 99.99

The second line is a second record and sometimes there is no code at all. I wrote this snippet of code to find these data lines:

<CODE> If ($line =~ /^(\wA-Z0-9)+/ && length($line) >= 112) {
# parse line
}
elseif ($line =~ /^\s{60}A-Z0-9/ && substr($line,60,3) ne "Inf") {
# parse line
}

But since there doesn't have to be another code there could be 9 more spaces. Right now I parse it like a fixed length string but I can't seem to get rid of trailing spaces for each field. Is there a simpler way? Please let me know and thanks, Jake

PS: I've not written HTML but there are braces (not curly) around the A-Z0-9 but for some reason they don't show up and it creates a link??? Also, 11/2000 should show up underneath 12/2000 for the second line.

Comment on Parsing a print file

Replies are listed 'Best First'.
Re: Parsing a print file by Fastolfe (Vicar) on Jan 05, 2001 at 06:20 UTC
I got bored and wrote something that'll parse your data and even figure out the format on-the-fly. Just be sure no headings have spaces in them. Leading and/or trailing whitespace in each field is left intact, so you'll have to trim that on your own if you need to. (Note that I've expanded upon this code here: Parse fixed-length ascii table) use Data::Dumper; my (@keys, $format, @data, @sizes); $_ = <DATA>; while (/\G(\S+)\s+/g) { push(@keys, $1); push(@sizes, $+[0]-$-[0]); } $format = join("", map { "a$_" } @sizes); while (<DATA>) { # if you want a hashref for each line: my $i; push(@data, { map { ($keys[$i++], $_) } unpack($format, $_) } ); # else, take out references to @keys above and just use this: push(@data, [ unpack($format, $_) ] ); } print Dumper(\@data); __DATA__ Name ID PS Gender Age Month Code + Cap Pool LName, FName 99999 99 M 99.9 12/2000 Add + 99.99 99.99 [download] Have fun.	[reply] [d/l]
Re: Parsing a print file by repson (Chaplain) on Jan 05, 2001 at 10:34 UTC
Seems like I must be as bored as Fastolfe.... Here's my own version which doesn't require fixed lengths and is more flexible in ways. #!perl -w use strict; use Data::Dumper; my %fields_types; my @data; my %prev; my $biggest; while (<DATA>) { if (/^#(.)/) { # new field declaration my @fields = split ' ',$1; $biggest = @fields if @fields > $biggest; $fields_types{scalar(@fields)} = \@fields; } else { my @line = split /(?<!,)(?<!\s)\s+/, $_; # split on whitesp +ace not following a comma if (exists($fields_types{scalar(@line)})) { my %tmp; @tmp{ @{ $fields_types{scalar(@line)} } } = @line; # use data from most recent full line of data if (scalar(@line) != $biggest) { for my $field ( keys %prev ) { $tmp{$field} \|\|= $prev{$field}; } } else { %prev = %tmp; } push @data, \%tmp; } else { warn "No definition for " . scalar(@line) . " field(s) at +input line $.\n"; } } } print Dumper( \@data ); __DATA__ #Name ID PS Gender Age Month Code Cap Pool #Month Code Cap Pool LName, FName 99999 99 M 99.9 12/2000 Add 99.99 99.99 11/2000 New 99.99 99.99 [download] Update:* Woo, my 100th post, too bad more than half of them came out under 10 points.	[reply] [d/l]
Re: Parsing a print file by Fastolfe (Vicar) on Jan 05, 2001 at 05:54 UTC
The formatting in your post is mucked up somehow. There are real `<code>` tags in there (perhaps because you're not closing your `<code>` tag?), which means spacing isn't preserved. So everyone can see what you're talking about: `Name ID PS Gender Age Month Code + Cap Pool LName, FName 99999 99 M 99.9 12/2000 Add + 99.99 99.99 11/2000 New + 99.99 99.99` [download] I almost think what you want is a flexible regular expression, but if this is a fixed-length string, perhaps unpack will work for you. `$_ = "LName, FName 99999 99 M 99.9 12/2000 Add + 99.99 99.99"; my @keys = qw{ name id ps gender age month code cap pool }; my %info; @info{@keys} = unpack("a16a8a10a9a9a9a10a11a8", $_); foreach (@keys) { $info{$_} =~ s/\s*$//; print "$_='$info{$_}'\n"; }` [download] Adjust the unpack format to match your string (the number of characters per field seems to jump all over the place, but this is probably because it was typed and not cut/pasted), and you should be in good shape.	[reply] [d/l] [select]
Re: Parsing a print file by jakeeboy (Sexton) on Jan 06, 2001 at 00:15 UTC
Thank you so much!! I like that unpack function. It works wonderfully. I still have to work out some things but you just made my life easier now that I can get them into a hash it's time to manipulate them. Thanks again. Jake	[reply]
Re: Parsing a print file by jakeeboy (Sexton) on Jan 05, 2001 at 21:31 UTC
Thanks for your help. I go ahead and work with each of the responses and see which one I like. Again thanks for your time and help Jake	[reply]