Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I am totally new to perl, so I humbly ask: How can a use a perl script to convert an ASCII print file containing this:
        Slate Enterprises, Inc.:2001050.01
          104
           1/24/2002     Johnson, Dean A.                         10.00
           1/24/2002     Botwell, Michael J                       10.00
          Total 104                                               20.00

          302
           1/25/2002     Beers, James T.                           2.50
           1/28/2002     Beers, James T.                           4.00
           1/29/2002     Beers, James T.                           1.00
           1/30/2002     Beers, James T.                           0.50
          Total 302                                                8.00

        Total Slate Enterprises, Inc.:2001050.01                  28.00

To a file formatted like this:

2001050.01,"Slate Enterprises, Inc.",104,"1/24/2002","Johnson, Dean A.",10.00
2001050.01,"Slate Enterprises, Inc.",104,"1/24/2002","Botwell, Michael J.",10.00
2001050.01,"Slate Enterprises, Inc.",302,"1/25/2002","Beers, James T.",2.50
2001050.01,"Slate Enterprises, Inc.",302,"1/28/2002","Beers, James T.",4.50
2001050.01,"Slate Enterprises, Inc.",302,"1/29/2002","Beers, James T.",1.00
2001050.01,"Slate Enterprises, Inc.",302,"1/30/2002","Beers, James T.",0.50


Thank you in advance.

Monk in Training
  • Comment on pulling fields out of a ascii print file

Replies are listed 'Best First'.
Re: pulling fields out of a ascii print file
by Limbic~Region (Chancellor) on May 06, 2003 at 23:18 UTC
    Anonymous Monk,
    May I suggest Data Munging with Perl as a good place to start. I would also highly recommend Mastering Regular Expressions. I am going to give your problem a stab just because I haven't ever done something like this before in Perl.

    Beware untested code - dragons lie ahead!

    #!/usr/bin/perl -w use strict; open (INPUT,"file") or die "Unable to open file : $!"; open (OUTPUT,">outfile") or die "Problem with outfile : $!"; select OUTPUT; $/ = ""; $\ = "\n"; my @data; my $foo; my $bar; while (<INPUT>) { my @lines = split /\n/; my $blah; while (my $line = shift @lines) { $line =~ s/^\s*//; last if ($line =~ /^Total/); next if (/^\s*$/); if ($line =~ /^([^:]*):(.*)$/) { $foo = $1; $bar = $2; next; } elsif ($line =~ /^(\d+)$/) { $blah = $1; next; } else { my @stuff = split /\s{2,}/ , $line; push @data , join ',' , ($bar, "\"$foo\"", $blah, "\"$stuf +f[0]\"", "\"$stuff[1]\"", $stuff[2]); } } } print foreach(@data);
    Now as I said - I wrote that code just because I have never done this sort of thing before. That means I am not very experienced and there is most likely a better way.

    Cheers - L~R

Re: pulling fields out of a ascii print file
by jdporter (Paladin) on May 07, 2003 at 10:55 UTC
    Limbic~Region's solution is fine; I'm throwing mine in just for a slightly different point of view.
    use strict; use warnings; my( $company, $co_id ); my $item; local $, = ","; local $\ = "\n"; while (<DATA>) { next unless /\S/; # ignore blank lines next if defined $item && /^ *Total \Q$item\E /; next if defined $co_id && /^ *Total \Q$company:$co_id\E /; if ( /^ *(.+):(\S+) *$/ ) { ( $company, $co_id ) = ( $1, $2 ); } elsif ( /^ *(\S+) *$/ ) { $item = $1; } else { my( $date, $person, $value ) = /^ {10}(.{10}) {5}(.*) +(\S+)/; $person =~ s/ +$//; $date =~ s/^ +//; print $co_id, qq("$company"), $item, qq("$date"), qq(" +$person"), $value; } } __DATA__ Slate Enterprises, Inc.:2001050.01 104 1/24/2002 Johnson, Dean A. 10.0 +0 1/24/2002 Botwell, Michael J 10.0 +0 Total 104 20.0 +0 302 1/25/2002 Beers, James T. 2.5 +0 1/28/2002 Beers, James T. 4.0 +0 1/29/2002 Beers, James T. 1.0 +0 1/30/2002 Beers, James T. 0.5 +0 Total 302 8.0 +0 Total Slate Enterprises, Inc.:2001050.01 28.0 +0
    Since the desired output appears to be CSV, one might consider using the Text::CSV_XS module for formatting the output:
    use Text::CSV_XS; my $csv = new Text::CSV_XS; . . . . . . $csv->combine( $co_id, $company, $item, $date, $person +, $value ); print $csv->string, "\n";

    jdporter
    The 6th Rule of Perl Club is -- There is no Rule #6.

Re: pulling fields out of a ascii print file
by shemp (Deacon) on May 06, 2003 at 21:03 UTC
    by print file, do you mean a file where a specific piece of data is generally in the same position? (fixed-length or flat file) If so, unpack is probably what you want to use. For instance, if each line above that contains date, name, number, is always such that each field begins at a specific character, unpack will work.

    Lets just say that there are 10 spaces, 12 chars for the date, 32 for the name, and 10 for the number, heres how to get the info from 1 line of that format:
    my ($garbage, $date, $name, $number) = unpack "A10A12A32A10", $dat +a_line;
    Of course, it appears that not all lines are the same format in your file, so you'll need to use contextual clues to decide how to parse different lines.
Re: pulling fields out of a ascii print file
by dragonchild (Archbishop) on May 06, 2003 at 20:38 UTC
    What you're asking is:
    1. How do I parse a file in a given format into Perl data structures?
    2. How do I transform one Perl data structure into another?
    3. How do I write a Perl data structure out to a file?
    How far have you gone in handling this? I would suggest writing your program in terms of those three questions.

    ------
    We are the carpenters and bricklayers of the Information Age.

    Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

    Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

Re: pulling fields out of a ascii print file
by Cody Pendant (Prior) on May 06, 2003 at 21:27 UTC
    I don't have anything particularly helpful, but I would handle it with a regex-on-every-line solution.

    Making the unproven assumption that the blocks of data are consistent, i.e. that they start with a name, the next line is a number only, and they end with two lines both beginning with the word "total", you can have your program read each line and do different kinds of data-grabbing based on context-- $current_company = 'Slate'; $reading_names = true until you get to the word Total at which time $reading_names goes back to false and so on.

    I'd like to hear other Monks' thoughts on this though. It's prone to error and you have to be very sure of the data to do it with any confidence.
    --

    “Every bit of code is either naturally related to the problem at hand, or else it's an accidental side effect of the fact that you happened to solve the problem using a digital computer.”
    M-J D