Report parsing

Dalin has asked for the wisdom of the Perl Monks concerning the following question:

Hey all, I have a report that has three different sections that I need to split up and then make spreadsheets out of. I need to pull data out of the report, from one point to another, three separate times. ex - The report has info for three different regions. The different areas of the report begin with the words "REGION" followed by a region number and name. The info for the particular region ends with a line containing the words "REGION TOTAL". What I need to do is collect the info between "REGION" and "REGION TOTAL" and save it into a separate array for later processing. I need to do this for each region in the report or three times... however you like to say it. I can succesfully test for and grab the line that begins with "REGION". How do I grab the info up to and including the "REGION TOTAL" line? Here is what I have so far:

#!/usr/local/bin/perl -w
#
#
#
####

use strict;

####
#
#
#
####

my $past_file = "pastinv.txt";
open(PFILE,"$past_file") || die "$past_file,$!";
my @pinv1 = <PFILE>;
close PFILE || die "$past_file,$!";


####
#
#
#
#
####

my @tmp_pinv = "@pinv1";
my $count = @pinv1;
my ($i,@sue,@mike,@steve);

for ($i = 0; $i<= $count -1; $i++) {
        my $line = shift(@pinv1);
        chomp $line;
        if ( $line =~ /Region/ ) {
              $line =~ s/\s+/ /g;
              my($region,$reg_num,@name) = split(/ /,$line);
                if ( $reg_num =~ /(7A)/ ) {
                     unshift(@steve,$region,$reg_num,@name);
                     print "@steve\n";
                     #region_info(\@steve,\@tmp_pinv);
                }elsif ( $reg_num == 7 ) {
                     unshift(@sue,$region,$reg_num,@name);
                     print "@sue\n";
                }elsif ( $reg_num == 8 ) {
                     unshift(@mike,$region,$reg_num,@name);
                     print "@mike\n";
                        }
        }else{
                next;
                }
        }
[download]

This is the code I'm using to grab lines that have the region numbers in them. The line that has the sub region_info was one of my many attempts at accomplishing what I'm trying to do here. If anyone can help me out I would greatly appreciate it. Thanks in advance, Bradley Where ever there is confusion to be had... I'll be there.

Comment on Report parsing Download Code

Replies are listed 'Best First'.
Re: Report parsing by AidanLee (Chaplain) on Jan 24, 2002 at 20:22 UTC
Well, I'm not sure on the specifics of what your data looks like, but something like this ought to work: `my $currentRegion = ''; my %regions = (); while (<FILE>) { chomp; /REGION (\d+) (.+)/ and do { $currentRegion = $1; $regions{$currentRegion}{Name} = $2; next; }; /REGION TOTAL (.+)/ and do { $regions{$currentRegion}{Total} = $1; $currentRegion = ''; next; }; do { push @{$regions{$currentRegion}{Data}}, $_ if $currentRegion n +e ''; next; }; }` [download] You now have a hash of the regions. you can then just call each region that belongs to each person like this: `# steve's regions (some added guessed region numbers for demonstration +): foreach my $region ( @regions{ '7A', '8A', '9A' } ) { #do stuff to each $region that belongs to steve... }` [download]	[reply] [d/l] [select]
Re: Re: Report parsing by Dalin (Sexton) on Jan 24, 2002 at 20:42 UTC
I'm working with the example you gave, but I am having trouble getting it to work. I get nothing when I attempt to print any values. Where ever there is confusion to be had... I'll be there.	[reply]
Re: Re: Re: Report parsing by AidanLee (Chaplain) on Jan 24, 2002 at 20:54 UTC
Well, here are some debugging tips (note I did not test the code...). First, I only gave you a section of the code. I'm sure you can see that I didn't open the file yet, which obviously needs to be done. Also, it's a good idea to `use strict` at all times, and either `use warnings` if you're running perl 5.6 or later, or add the `-w` switch when running your script. My personal favorite for monitoring data structures is to use Data::Dumper to view what's going on: `use Data::Dumper; print Dumper(\%regions);` [download] will give you output of what the `%regions` hash currently looks like. HTH	[reply] [d/l] [select]
Re: Report parsing by talexb (Chancellor) on Jan 24, 2002 at 20:27 UTC
Because both the start and the end markers for the section have the word "region", you'll have to work a little harder to catch the boundaries. You could so it this way .. `if ( /REGION/ ) { if ( /TOTAL/ ) { # Finish with that section } else { # Start with that section } } else { # Regular data line .. handle normally (see below) }` [download] That's something that reads pretty well. Then, I'd probably use a hash to store how region numbers map to different arrays: `my %SalesReps = { 7A => \@steve, 7 => \@sue, 8 => \@mike };` [download] So when you're starting a Region, pick the appropriate array from the hash .. `my $ThisArray = $SalesRep{ $region }` [download] and when you get to the code where you're handling a regular data line, use that array reference to unshift the data into the array. `unshift ( @{ $ThisArray }, $region, $reg_num, @name );` [download] What I haven't done here is catch errors where the region number doesn't match anything in the hash -- it's important to add that code in. Update: I'm not sure why you print the array after each record .. it seems to me that you'd get record 1, 1-2, 1-3, 1-4 and so forth. Why not wait till the end of the input file to do that? Better yet, write the data to three output files .. less memory usage that way. --t. alex "Of course, you realize that this means war." -- Bugs Bunny.	[reply] [d/l] [select]
Re: Re: Report parsing by Dalin (Sexton) on Jan 24, 2002 at 20:45 UTC
The prints were my tests to make sure I was grabbing the appropriate lines. I'll try what you have offered and get back to you. Thanks Where ever there is confusion to be had... I'll be there.	[reply]