Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I posted a couple of days ago asking how to make array sout of a text file, I tried it but found that it didn't exactly fit my needs. What I need to be able to do is input a text file and process the contents of it - being able to grab any piece of data and output it in any order. here's an example of a text file that i'll be using...
HDR,831 Reconciliation,200008081441EST BGN,11,000000158,20000808,10520000 OTI,GH,BT,WEDI620000802,159 AMT,2,231644 AMT,BT,231644 AMT,NP,0 QTY,46,10 QTY,53,0 QTY,54,10 QTY,55,0 TED,ZZZ,BATCH AWAITING 831 OTI,TR,CK,0021000095,159,000101416 TED,ZZZ,MOD CHECK ON RDFI-ID FAILED
Note: there could be multiple lines that begin w/QTY, TED and OTI. What I did was write a PERL script that inputs the file, creates an array of six items (for those that don't have six items it fills the empty ones in as blank), it then goes and counts how many of the first item appears. Here is the code I have written thus far (i know it's not the most efficient but it works!)
#!/usr/local/bin/perl5 #PERL Script #1. Input the text file containing the formatted info open (FILE,"<824.txt") || die "Can't open file $!"; @lines = <FILE>; #initialize the counters $HDRcount = 0; $BGNcount = 0; $OTIcount = 0; $AMTcount = 0; $QTYcount = 0; $TEDcount = 0; #Create the main array foreach (@lines) { chop; ($key, $first, $second, $third, $fourth, $fifth) = (split(/,/)); $key = "" if !defined($key); #I know this part isn't needed bu +t i'd rather not get errors $first = "" if !defined($first); $second = "" if !defined($second); $third = "" if !defined($third); $fourth = "" if !defined($fourth); $fifth = "" if !defined($fifth); $sixth = "" if !defined($sixth); print("Key=$key 1=$first 2=$second 3=$third 4=$fourth 5= +$fifth\n"); #Search through firstElementArray counting up each instance of # HDR, BGN, OTI, AMT, QTY, TED if ($key eq "HDR") { $HDRcount++; #create an array here, similiar to @HDR($first, $second, $thir +d) } $BGNcount++ if ($key eq "BGN"); $OTIcount++ if ($key eq "OTI"); $AMTcount++ if ($key eq "AMT"); $QTYcount++ if ($key eq "QTY"); $TEDcount++ if ($key eq "TED"); } #test to make sure it works print("HRD=$HDRcount BGN=$BGNcount OTI=$OTIcount AMT=$AMTcount QTY +=$QTYcount TED=$TEDcount\n"); close(FILE);
What I would like to be able to do is go in and output the data in any format. For example, to print out the second and third items of all the QTY's and print the third item in the HDR and the fourth and fifth of all of the OTIs. I've been looking at associative array's/hashes and everything else and I really don't know what to use. Any suggestions?

Replies are listed 'Best First'.
Re: Creating the right type of array...
by ncw (Friar) on Aug 30, 2000 at 00:54 UTC
    Here is how I would do it. Note I'd class this as fairly advanced perl - hash refs, array refs all nested etc, but an idiom well worth picking up.
    #!/usr/bin/perl -w use strict; my ($key); # Collect the data my $data = {}; while (<>) { chomp; my @row = split /,/; next unless @row; $key = shift @row; # the following is the most important line in the program # understand it and you've got it! push @{$data->{$key}}, [ @row ]; } # Uncomment these lines if you have Data::Dumper and you want to see # what the data structure looks like # use Data::Dumper; # $Data::Dumper::Terse = 1; # print Dumper($data); print "Total count\n"; for $key (sort keys %$data) { print " $key has ", scalar @{$data->{$key}} , " entries\n"; } print "Second and third items of all QTYs\n"; for $key (@{$data->{QTY}}) { print " $key->[0], $key->[1]\n"; } print "Third items in HDRs\n"; for $key (@{$data->{HDR}}) { print " $key->[1]\n"; } print "4th and 5th items in OTIs\n"; for $key (@{$data->{OTI}}) { print " $key->[2], $key->[3]\n"; }
    Run this with
      perl progname.pl 824.txt
    

    When I run it on your test data it produces this output

    Total count
      AMT has 3 entries
      BGN has 1 entries
      HDR has 1 entries
      OTI has 2 entries
      QTY has 4 entries
      TED has 2 entries
    Second and third items of all QTYs
      46, 10
      53, 0
      54, 10
      55, 0
    Third items in HDRs
      200008081441EST
    4th and 5th items in OTIs
      WEDI620000802, 159
      0021000095, 159
    
Re: Creating the right type of array...
by chromatic (Archbishop) on Aug 30, 2000 at 00:34 UTC
    Seems to me that you need a hash of lists -- something like this (untested):
    my %elements; foreach (@lines) { my ($key, @items); chomp; ($key, @items) = split(/,/, $_); push @{ $elements{$key} }, [ @items ]; }
    That'll get you a hash, keyed by HRD, BGN, OTI or what have you. To get at all of the third items of HDR, your code is something like this:
    foreach my $row (@{ $elements{HRD} }) { print $row->[2], "\n"; # remember, offset by one }
    This *is* untested, but even if it's buggy, it'll be remarkably similar to the actual tested code. (I have a bit of a headache right now, and you should really see perlref and perldsc for the real scoop.) No warranty today.
RE: Creating the right type of array...
by Boogman (Scribe) on Aug 30, 2000 at 00:42 UTC
    I would use a hash which held references to arrays of arrays. You could use the first column as the key, and hold a reference to an array where each element is a reference to an array holding the data of each line that had that key. Sounds complicated, but isn't too bad once you get the hang of it.

    heres some sample code:
    #!/usr/bin/perl use warnings; use strict; open FILE, "input" || die "Couldn't open file"; my %hash; while ( <FILE> ) { chomp; my ( $key, @rest ) = split /,/; foreach ( 0 .. 4 ) { $rest[$_] = "" unless ( defined( $rest[$_] ) ); } push @{ $hash{$key} }, \@rest; } # To print out every 1st row of all the ones beginning with QTY print "$_->[0]\n" foreach ( @{ $hash{QTY} } ); # total number of entries for the category print scalar( @{ $hash{QTY} } ); # print a specific row of data print "@{ $hash{QTY}[2] }\n";
    Update: Hehehe... looks like i was beaten to it... oh well... the more the merrier, right?
Re: Creating the right type of array...
by cwest (Friar) on Aug 30, 2000 at 00:36 UTC
    The point of all that ugly code below is to tell you that if you want custom views on your data, you need to make them. Use subroutines that return data or print data like the ones below. It's also there because I think that's a good data structure for you to use.
    #!/usr/local/bin/perl -w use strict; $|++; use constant TITLES => 1; my $data = {}; while ( <DATA> ) { my $row = [ split /,/ ]; push @{$data->{$row->[0]}}, $row; } sub totals_report { my $data = shift; print "Totals Report:\n" if TITLES; foreach ( sort keys %{$data} ) { print $_ . ":\t" . @{$data->{$_}} . "\n"; } } sub print_col { my $data = shift; my $col = shift || 0; print 'Column ' . $col . " print out:\n" if TITLES; foreach my $type ( sort keys %{$data} ) { print $type . ":\n" if TITLES; foreach my $row ( @{$data->{$type}} ) { print "\t" . ( $row->[$col] || '[none]' ) . "\n"; } } } totals_report( $data ); print "\n\n"; print_col( $data, 3 ); __DATA__ HDR,831 Reconciliation,200008081441EST BGN,11,000000158,20000808,10520000 OTI,GH,BT,WEDI620000802,159 AMT,2,231644 AMT,BT,231644 AMT,NP,0 QTY,46,10 QTY,53,0 QTY,54,10 QTY,55,0 TED,ZZZ,BATCH AWAITING 831 OTI,TR,CK,0021000095,159,000101416 TED,ZZZ,MOD CHECK ON RDFI-ID FAILED
    Enjoy
    --
    Casey
    
Re: Creating the right type of array...
by adamsj (Hermit) on Aug 30, 2000 at 01:44 UTC
    A slow-to-arrive answer: This here will print out the first field in the first occurrence of each of your six keys.
    #!/usr/bin/perl open (SIX,"</home/adamsj/sixtext"); @lines = <SIX>; close(SIX); foreach (@lines) { chop; @a_line = split(/,/); $key = shift @a_line; push(@{$all_of_it{$key}},[@a_line]); } foreach (keys %all_of_it) { print "$all_of_it{$_}[0][0]\n";}
    This should give you an idea how to do it--you could look in the Perl docs for more on this. I think the best treatment of references is in Advanced Perl Programming by Sriram Srinivasan (not that I have a copy--I used to borrow someone else's), but there are a lot of other good Perl books out there. This _is_ moderately advanced Perl--if you have trouble with this example, try working up to it with this:
    foreach (@lines) { chop; ($key, $line) = split(/,/, $_, 2); push(@{$all_of_it{$key}},$line); } foreach (keys %all_of_it) { print "$key,$all_of_it{$_}[0]\n";}
    What this does is split the line into the key and the remainder of the line, then puts the line into an array keyed by your, well, your key. It'll print out the entire line from the first occurence of each $key. Once you see how this one works, go for the more complex example. Good luck--I found this tricky at first, too.
Re: Creating the right type of array...
by Anonymous Monk on Aug 30, 2000 at 23:53 UTC
    NCW, how would you grab the fourth item in ONLY the first OTI? and then the fourth in second OTI. I don't want to access them through a for loop but rather grab them individually. ex.
    print "4th item in OTIs\n"; for $key (@{$data->{OTI}}) { print " $key->[3]\n"; }
    The output from that would be WEDI620000802 0021000095 Again, I would like to access those each individually instead of looping through them and grabbing them both at the same time.
      You just say $data{OTI}->[0][3].

      The first index number will indicate which line it is (i.e. 0 for the first OTI line, the 1 for second OTI line, etc.) while the second number indicates the element in that line. So just write: $data{type}->[line][element] to access the element you want.
Re: Creating the right type of array...
by Anonymous Monk on Aug 31, 2000 at 00:03 UTC
    ACtually i just got the damn thing to work... print "@{$data->{OTI}->[0]}->2\n"; Where the first [0] is the OTI Column# and the 2 is the row in the column (if you visualize it like a spreadsheet).