pulling fields out of a ascii print file

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: pulling fields out of a ascii print file by Limbic~Region (Chancellor) on May 06, 2003 at 23:18 UTC
Anonymous Monk, May I suggest Data Munging with Perl as a good place to start. I would also highly recommend Mastering Regular Expressions. I am going to give your problem a stab just because I haven't ever done something like this before in Perl. Beware untested code - dragons lie ahead! #!/usr/bin/perl -w use strict; open (INPUT,"file") or die "Unable to open file : $!"; open (OUTPUT,">outfile") or die "Problem with outfile : $!"; select OUTPUT; $/ = ""; $\ = "\n"; my @data; my $foo; my $bar; while (<INPUT>) { my @lines = split /\n/; my $blah; while (my $line = shift @lines) { $line =~ s/^\s//; last if ($line =~ /^Total/); next if (/^\s$/); if ($line =~ /^([^:]):(.)$/) { $foo = $1; $bar = $2; next; } elsif ($line =~ /^(\d+)$/) { $blah = $1; next; } else { my @stuff = split /\s{2,}/ , $line; push @data , join ',' , ($bar, "\"$foo\"", $blah, "\"$stuf +f[0]\"", "\"$stuff[1]\"", $stuff[2]); } } } print foreach(@data); [download] Now as I said - I wrote that code just because I have never done this sort of thing before. That means I am not very experienced and there is most likely a better way. Cheers - L~R	[reply] [d/l]
Re: pulling fields out of a ascii print file by jdporter (Paladin) on May 07, 2003 at 10:55 UTC
Limbic~Region's solution is fine; I'm throwing mine in just for a slightly different point of view. use strict; use warnings; my( $company, $co_id ); my $item; local $, = ","; local $\ = "\n"; while (<DATA>) { next unless /\S/; # ignore blank lines next if defined $item && /^ Total \Q$item\E /; next if defined $co_id && /^ Total \Q$company:$co_id\E /; if ( /^ (.+):(\S+) $/ ) { ( $company, $co_id ) = ( $1, $2 ); } elsif ( /^ (\S+) $/ ) { $item = $1; } else { my( $date, $person, $value ) = /^ {10}(.{10}) {5}(.) +(\S+)/; $person =~ s/ +$//; $date =~ s/^ +//; print $co_id, qq("$company"), $item, qq("$date"), qq(" +$person"), $value; } } __DATA__ Slate Enterprises, Inc.:2001050.01 104 1/24/2002 Johnson, Dean A. 10.0 +0 1/24/2002 Botwell, Michael J 10.0 +0 Total 104 20.0 +0 302 1/25/2002 Beers, James T. 2.5 +0 1/28/2002 Beers, James T. 4.0 +0 1/29/2002 Beers, James T. 1.0 +0 1/30/2002 Beers, James T. 0.5 +0 Total 302 8.0 +0 Total Slate Enterprises, Inc.:2001050.01 28.0 +0 [download] Since the desired output appears to be CSV, one might consider using the Text::CSV_XS module for formatting the output: `use Text::CSV_XS; my $csv = new Text::CSV_XS; . . . . . . $csv->combine( $co_id, $company, $item, $date, $person +, $value ); print $csv->string, "\n";` [download] jdporter The 6th Rule of Perl Club is -- There is no Rule #6.*	[reply] [d/l] [select]
Re: pulling fields out of a ascii print file by shemp (Deacon) on May 06, 2003 at 21:03 UTC
by print file, do you mean a file where a specific piece of data is generally in the same position? (fixed-length or flat file) If so, unpack is probably what you want to use. For instance, if each line above that contains date, name, number, is always such that each field begins at a specific character, unpack will work. Lets just say that there are 10 spaces, 12 chars for the date, 32 for the name, and 10 for the number, heres how to get the info from 1 line of that format: `my ($garbage, $date, $name, $number) = unpack "A10A12A32A10", $dat +a_line;` [download] Of course, it appears that not all lines are the same format in your file, so you'll need to use contextual clues to decide how to parse different lines.	[reply] [d/l]
Re: pulling fields out of a ascii print file by dragonchild (Archbishop) on May 06, 2003 at 20:38 UTC
What you're asking is: How do I parse a file in a given format into Perl data structures? How do I transform one Perl data structure into another? How do I write a Perl data structure out to a file? How far have you gone in handling this? I would suggest writing your program in terms of those three questions. ------ We are the carpenters and bricklayers of the Information Age. Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement. Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.	[reply]
Re: pulling fields out of a ascii print file by Cody Pendant (Prior) on May 06, 2003 at 21:27 UTC
I don't have anything particularly helpful, but I would handle it with a regex-on-every-line solution. Making the unproven assumption that the blocks of data are consistent, i.e. that they start with a name, the next line is a number only, and they end with two lines both beginning with the word "total", you can have your program read each line and do different kinds of data-grabbing based on context-- $current_company = 'Slate'; $reading_names = true until you get to the word Total at which time $reading_names goes back to false and so on. I'd like to hear other Monks' thoughts on this though. It's prone to error and you have to be very sure of the data to do it with any confidence. -- “Every bit of code is either naturally related to the problem at hand, or else it's an accidental side effect of the fact that you happened to solve the problem using a digital computer.” M-J D	[reply]