virtualweb has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks:

I need help with a slight variation of a code example I found in an O'relly book, regarding arrays of arrays.
(http://lux.e-reading.bz/htmbook.php/orelly/perl4/prog/ch09_01.htm#FOOTNOTE-1).

The book explains "we omit (for clarity) the my declarations that you would ordinarily put in."

The snippet I have reads a text file with the following content:

10 20 30 40 50
15 25 35
1 2 3 4 5 6 7 8 9 10


If I need the sum total from each line this would be the code where @AoA contains the above three lines:

## Load values into the Array ### $Test_File = "test_file.txt"; open (TESTFILE, "$Test_File"); while (<TESTFILE>) { @tmp = split; # Split elements into an array. push @AoA, [ @tmp ]; # Add an anonymous array reference t +o @AoA. } close (TESTFILE); ## Print the totals for each line ### for $i ( 0 .. $#AoA ) { $row = $AoA[$i]; for $j ( 0 .. $#{$row} ) { $Total_Balance[$i] += "$row->[$j]"; } print "the total is: ($Total_Balance[$i])<hr>"; }

The output for the above code is:

         the total is: (150)
         the total is: (75)
         the total is: (55)


I need the same output but I changed the text file a little bit, (the row number is in the left and the value to add up is in the right), as follows:

1|10
1|20
1|30
1|40
1|50
2|15
2|25
2|35
3|1
3|2
3|3
3|4
3|5
3|6
3|7
3|8
3|9
3|10


For this example I only have three anonymous arrays but there could be 10 or 30 (unknown)
Thanx beforehand for your help

Replies are listed 'Best First'.
Re: array of arrays
by kevbot (Vicar) on Jun 11, 2017 at 06:02 UTC
    This can be done using a hash.
    #!/usr/bin/env perl use strict; use warnings; open my $fh, '<', 'test_file.txt' or die 'Can not open file'; my %data; while(<$fh>){ chomp $_; my ($key, $val) = split(/\|/, $_); $data{$key} += $val; } foreach my $key ( sort { $a <=> $b } keys %data ){ print "the total is: (".$data{$key}.")\n"; } exit;

      Hello kevbot:

      Thank you for your solution. Since this task is for me to learn, took me a while to study one by one from the bottom up. Yours was the last.

      I would like to know if it is possible for you to show me how print on the fly each row of the while loop with the array size and the sum total of each array.

      Something like this:

      my %data; $e = '0'; while(<$fh>){ $e++; chomp $_; my ($key, $val) = split(/\|/, $_); push @{$data{$key}},$val; $Grand_Total += "$val"; my $size = scalar(@{$data{$key}}); ## WRONG doesn't print $Sub_Total[$key] += $data{$key}[$val]; ## WRONG doesn't prin +t print"Row ($e) / Array Size [$size] key ($key) / Amount ($val +) / Sub Total ($Sub_Total[$key]) / Grand_Total = ($Grand_Total)<hr>" +; }


      Hope you can do it

      Thanx
      virtualweb

        Replace

        $Sub_Total[$key] += $data{$key}[$val];
        with
        $Sub_Total[$key] +=$val;
        You should then come back and tell us what you did wrong in the original.

        Hello virtualweb,

        My solution did not use an array. The contents of $data{$key} are a single scalar value (the running total of values for the given key). So, my code is not keeping track of how many data entries that are encountered for a given value of $key.

        Here is a modified version of my code that will print out the information you request. Note, I'm still not using arrays. In this code, $data{$key} contains a hash reference with keys size and total. The value of size is the current number of elements found for the given $key. The value of total is the current total of values that have been encountered for the given $key.

        #!/usr/bin/env perl use strict; use warnings; open my $fh, '<', 'test_file.txt' or die 'Can not open file'; my %data; my $row_number = 1; my $grand_total = 0; while(<$fh>){ chomp $_; my ($key, $val) = split(/\|/, $_); $data{$key}->{'size'}++; $data{$key}->{'total'} += $val; $grand_total += $val; print "Row ($row_number) ". "/ Array Size [$data{$key}->{'size'}] key ($key) ". "/ Amount ($val) / Sub Total ($data{$key}->{'total'}) ". "/ Grand_Total = ($grand_total)\n"; ++$row_number; } foreach my $key ( sort { $a <=> $b } keys %data ){ print "the total is: (".$data{$key}->{'total'}.")\n"; } exit;
Re: array of arrays
by Marshall (Canon) on Jun 11, 2017 at 06:49 UTC
    Yes, kevbot's solution with a hash is good. Here is what I typed while keybot was also typing - this preserves the numbers in each row if more calculations are needed. If there are only 30 rows, a hash of array is a good idea. If there are 100,000 rows, that advice would change. Here the hash gets rid of the need to deal with index[0] of the array of array (2-D array). A more efficient way can be done by processing row by row and outputting a line result when the first number changes.
    #!usr/bin/perl use strict; use warnings; use Data::Dumper; my %HoA; # a Hash of Array while (my $line = <DATA>) { my ($bucket, $num) = $line =~ m/^\s*(\d+)\s*\|\s*(\d+)/; push @{$HoA{$bucket}},$num; } my $total; foreach my $key (sort {$a<=>$b} keys %HoA) { my $line_total; foreach my $num (@{$HoA{$key}}) { $line_total += $num; } print "Line $key total = $line_total\n"; $total += $line_total; } print "Grand Total = $total\n"; =Prints Line 1 total = 150 Line 2 total = 75 Line 3 total = 55 Grand Total = 280 =cut __DATA__ 1|10 1|20 1|30 1|40 1|50 2|15 2|25 2|35 3|1 3|2 3|3 3|4 3|5 3|6 3|7 3|8 3|9 3|10

      Hi Marshall:

      Thanx for this solution.. I was going from bottom to top analyzing all suggestions and that's why in your next solution I asked for what you coded here. (I didn't see it till now).

      I wonder if you can help me print the size of each anonymous array ($count) and all the results (number, sub total and grand total) in one line inside the while loop something like this:

      my $e = '0'; while (my $line = <DATA>) { $e++; my ($bucket, $num) = $line =~ m/^\s*(\d+)\s*\|\s*(\d+)/; push @{$HoA{$bucket}},$num; my $grand_total += $num; ## Sub Total = the running total per line up to 150, 75 and 55.? my $sub_total = "????"; ## the size of each anonymous array ## @Count[$bucket] += $num; ## I know this is super wrong ## print"Row $e / Number Array ($bucket) / Num ($num) / Array Size ($C +ount[$bucket])/ Sub Total ($sub_total) / Grand_Total ($grand_total)/n +"; }


      Thanx beforehand

        Ok, glad that you saw my 2 solutions with your 2 different data formats. This is easier if you make an AoA or a HoA and then as a second step, do the sums. In my HoA solution the "number of numbers" is just the scalar value of @{$HoA{$key}}. Something like print "elements in array=".@{$HoA{$key}}."\n" should work.

        Now of course it is not necessary to even fiddle with an AoA or a HoA. You can just keep a running sum as you go. When the "line no" changes, print the current line results and start a "new line". The disadvantage is that the program logic is a bit more complicated, because you have to figure out "on the fly" when a new "line" starts and when it finishes.

        As an example, I coded one way to do this without creating the intermediate AoA or the HoA. This code of course uses less memory, but that is probably not even a remote consideration for your application. Nowadays a temporary data structure with 100's of MB's is nothing! The "expense" of using less memory is the extra complication of more decisions. Not all lines of code are "equal". Lines that make decisions are more error prone than ones that don't. For short, non-critical "utilities" I prefer the simplest program logic that "gets the job done" because the code is less likely to have a bug. Sometimes I work on some module that although what it does is "simple", it must be made very efficient for the overall system to work (maybe it is used often or processes a lot of data). In that situation a lot more work in coding and testing is required. Programming is part science and part art.

        So here is yet another way... If you want to have a count of the "number of numbers" in each line, then set up a variable that is incremented every time that $line_total is changed (either by assignment or by addition of a additional value). I leave that as an exercise should you desire. When looping, there are often 3 phases to consider: a)how to get loop started, b)what loop normally does and c) what happens to finish the loop. Rather than starting the coding with (a), with experience you will code (b) first and then figure out how make (a) and (c) happen.

        I do hope that my point about avoiding indices when possible sunk in. Anyway, as a demo exercise, an algorithm that does not create a full memory representation of the data, but rather calculates as it goes:

        #!usr/bin/perl use strict; use warnings; my $line_total=0; my $total = 0; my $current_bucket = undef; while (my $line = <DATA>) { my ($bucket, $num) = $line =~ m/^\s*(\d+)\s*\|\s*(\d+)/; if (!defined($current_bucket)) # start the first "bucket". # use of defined() instead of zero # as a flag allows for a "zero" # bucket which I added as a # test case. { $line_total = $num; $current_bucket = $bucket; } elsif ($bucket == $current_bucket) # "normal" case { $line_total += $num; } else # a new "bucket" starts... { # output current bucket's results print "Line $current_bucket = $line_total\n"; $total += $line_total; # We've already read a line for the next bucket. # Adjust values to start $line_total running for this # new "bucket" $line_total = $num; $current_bucket = $bucket; } } # print the last bucket's results to finalize output: print "Line $current_bucket = $line_total\n"; $total += $line_total; ## This is the total result print "total=$total\n"; =Prints Line 0 = 10 Line 1 = 150 Line 2 = 75 Line 3 = 55 total=290 =cut __DATA__ 0|10 1|10 1|20 1|30 1|40 1|50 2|15 2|25 2|35 3|1 3|2 3|3 3|4 3|5 3|6 3|7 3|8 3|9 3|10
Re: array of arrays
by Marshall (Canon) on Jun 11, 2017 at 09:02 UTC
    I re-looked at your original code and my brain hurts!

    It is of course possible to use indices to access a 2-D array in Perl, however this is not the normal situation. A far, far more normal situation is to access each row as an array of values. This is also true in C albeit with different syntax than this.

    I re-wrote your code below.
    These integer index buddies of i and j just don't appear that often in Perl code. Of course Perl allows that syntax. Note that by "not often", I do not mean "never". The most common errors in programming are memory allocation errors and "off by one" errors when using array indices or when looping. Perl for the most part takes care of memory allocation for you in a very efficient way - you don't have to worry about it unless you are doing something really fancy. This "off by one error" stuff can be much more problematic. In general don't use i or j indices unless you have to.

    #!/usr/bin/perl use strict; use warnings; my @AoA; while (my $line = <DATA>) { my @tmp = split ' ',$line; push @AoA, [ @tmp ]; } ## Print the totals for each line ### ## and the final grand_total ### my $grand_total; foreach my $row_ref (@AoA) { my $line_total; foreach my $num (@$row_ref) { $line_total += $num; } print "Line Total: $line_total\n"; $grand_total += $line_total; } print "Grand Total: $grand_total\n"; =Prints: Line Total: 150 Line Total: 75 Line Total: 55 Grand Total: 280 =cut __DATA__ 10 20 30 40 50 15 25 35 1 2 3 4 5 6 7 8 9 10

      Hello Marshall:

      I like your solution.. it does way with indices as you say. I copied it to my file and works great.

      Now how would you do the same thing if the test_file.txt would be like this..??

      1|10
      1|20
      1|30
      1|40
      1|50
      2|15
      2|25
      2|35
      3|1
      3|2
      3|3
      3|4
      3|5
      3|6
      3|7
      3|8
      3|9
      3|10

        kevbot and Marshall have already given solutions that handle the  1|10 organization of data. What was wrong with these approaches? If you can identify and clearly explain shortcomings, I'm sure they can be addressed.


        Give a man a fish:  <%-{-{-{-<

Re: array of arrays
by BillKSmith (Monsignor) on Jun 11, 2017 at 14:11 UTC
    Your existing code will work with very little change.
    while (<TESTFILE>) { #@tmp = split; # Split elements into an array. #push @AoA, [ @tmp ]; # Add an anonymous array reference +to @AoA. @tmp = split /\|/; $AoA[$tmp[0]-1] = []if !defined $AoA[$tmp[0]-1]; push @{$AoA[$tmp[0]-1]}, $tmp[1]; }
    Bill

      Hi Bill:
      Thanx for your suggestion.. I like your solution very much because it only adds a few lines to the existing example instead of re-building the whole thing. How would print the totals..??

        Modifying existing software to support new requirements is called "Maintenance". (A poor choice of words, but were stuck with it.) Few of us ever get the luxury to start over, even when it would be cheaper in the long term. In the short term, it is almost always faster to cram in one more change. In that spirit, I suggest:
        my $Grand_Total = 0; for $i ( 0 .. $#AoA ) { $row = $AoA[$i]; for $j ( 0 .. $#{$row} ) { $Total_Balance[$i] += "$row->[$j]"; } print "the total is: ($Total_Balance[$i])<hr>\n"; $Grand_Total += $Total_Balance[$i]; } print "Grand Total is: ($Grand_Total)<hr>\n" _

        But wouldn't you prefer to write:

        use strict; use warnings; use List::Util qw(sum); my @AoA; while (<DATA>) { my ($index, $value) = split /\|/; push @{ $AoA[$index-1] }, $value; } my @Total_Balance = map {sum( @$_ )} @AoA; print "The total is: ($_)\n" foreach @Total_Balance; print "Grand total is: (", sum( @Total_Balance ), ")\n"; __DATA__ 1|10 1|20 1|30 1|40 1|50 2|15 2|25 2|35 3|1 3|2 3|3 3|4

        This design is neither fast nor small. Its merit is that each pass through the data can only do one thing. You can understand, validate, or modify any section without concern about side effects.

        Bill
Re: array of arrays
by Anonymous Monk on Jun 11, 2017 at 01:47 UTC
    perlintro, perlreftut, Generation of an ARRAY OF ARRAYS
    $ perl -F'\|' -nale '$x[$F[0]-1]+=$F[1]}{print"the total is: ($_)"for@ +x' input.txt the total is: (150) the total is: (75) the total is: (55)
    You might need different quoting on -F depending on your shell. Add option -MO=Deparse to see the longer code