http://qs1969.pair.com?node_id=1176960

fasoli has asked for the wisdom of the Perl Monks concerning the following question:

Hi again all. Thanks to your help I solved my first problem, thank you again. I'm after some feedback about a second problem I'm facing.

So, first I'm opening four files that each have one 3x3 matrix in them, that looks like this

1 2 3 4 5 6 7 8 9
2 3 4 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
3 4 5 4 5 6 7 8 9

Now I want to do some basic math on the elements of the matrices, namely to get their average value (and then proceed to get the standard deviation, I haven't tried that yet). As I said they are four 3x3 test matrices, each one in a different file. These are test cases as my real matrices are thousands of lines long and the number of files is again thousands. So I'm after a new matrix that will have the average values of the four matrices, like so (first line only) - obviously the new matrix elements are filled with the sum of the elements divided by the number of the matrices, for example for the first line 1 + 2 + 1 + 3 = 7 / 4 = 1.75. Then the second element, 2 + 3 + 2 + 4 = 11 / 4 = 2.75. And the third element of the first line 3 + 4 + 3 + 5 = 15 / 4 = 3.75 and so on for all the elements, resulting in one matrix.

1.75 2.75 3.75 x x x x x x

When I print the matrices with $line[$a][$b] they look correct, so I've commented this line out as it was used as a test to see if they look ok. The output I'm getting if I try to get the sum of the elements and then the mean, is Use of uninitialized value in addition (+) at test_SD_Wednesday.pl line 52 and a bunch of wrong numbers as sum/averages. I'm pretty sure I'm screwing something up badly but I can't figure out what. Any hints please? I'm starting to get scared my supervisors will really shout at me or think I'm an idiot if I tell them it's taken me 10 days to have a script that doesn't work...

Finally can someone comment on the indentation? Does it make sense they way I did it? It looks less messy but no clue if I got the philosophy behind it.

#/bin/perl/ use strict; use warnings; use autodie; my $molec = "1ac6"; my $cluster; my $times; my $input; my $path = "/media/RAIDstorage/home/athina/dist-analysis/${molec}/time +series/test"; my $line; my @files; @files = `ls $path\/$molec-times*`; foreach (@files) { my $a; my $b; my $m_avrg; my @m_avrg; undef @m_avrg; my @list; my $list; my $aver; /${molec}-times-(\d+)-clust(\d{1})/; $times = $1; $cluster = $2; open $input, '<', "$path\/${molec}-times-${times}-clust${cluster}.out +" or die $!; while ($line = <$input>) { chomp $line; push @list, [split/\s+/, $line]; } # while input close $input; for ($a=0; $a<=2; $a++) { for ($b=0; $b<=2; $b++) { #print "$list[$a][$b] "; # check matrices $m_avrg[$a][$b] = ($m_avrg[$a][$b] + $list[$a][$b]); print "$m_avrg[$a][$b] \n"; } print " \n"; } print "average\n"; for ($a=0; $a<=2; $a++) { for ($b=0; $b<=2; $b++) { $m_avrg[$a][$b] = $m_avrg[$a][$b] / 4; print "$m_avrg[$a][$b] "; } print "\n"; } } # foreach file in loop

Replies are listed 'Best First'.
Re: how to get average of matrices' elements?
by choroba (Cardinal) on Nov 30, 2016 at 23:35 UTC
    To work with matrices, use PDL.

    Update

    I've got it! It's super easy. Just pile the matrices one on another, and then project the 3D matrix by average in the z dimension:

    #!/usr/bin/perl use warnings; use strict; use PDL; sub load { my ($filename) = @_; open my $FH, '<', $filename or die $!; return pdl(map [split], <$FH>) } my $matrix = cat(map load($_), @ARGV); print average($matrix->reorder(2,0,1));

    Old contents

    I'm not yet familiar with it, so I had to compute the average myself by adding the matrices and dividing the result by their number, but maybe there already is a function to compute the average per element, or at least to apply a function per element.

    #!/usr/bin/perl use warnings; use strict; use feature qw{ say }; use PDL; sub load { my ($filename) = @_; open my $FH, '<', $filename or die $!; return pdl(join "\n", map { chomp; "[$_]" } <$FH>) } my @matrices = map load($_), @ARGV; my $result = $matrices[0]; $result += $_ for @matrices[ 1 .. $#matrices ]; $result /= @ARGV; say $result;
    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
      Tiny note: an easier way to get the dimension you want to average over to the 0 position than reorder is mv. That also can take negative numbers which count backwards from end, leading to this very common idiom:
      print $matrix->mv(-1,0)->avgover;
Re: how to get average of matrices' elements?
by hippo (Bishop) on Dec 01, 2016 at 08:37 UTC
    Finally can someone comment on the indentation? Does it make sense they way I did it? It looks less messy but no clue if I got the philosophy behind it.

    The philosophy is not tricky. Indentation is supposed to make the code easier to read by making it very obvious to the observer where each block of code and/or each statement starts and ends. This is particularly important when you end up with multi-line statements or a number of blocks which close at the same time.

    With that in mind, indenting by 1 space for each level will earn you very few friends. Even worse you have been inconsistent. Compare the indenting of the while loop which reads the file versus the following for loop which starts the processing - the former is less indented than the latter which is wrong and therefore misleading to anyone reading the code.

    Use 2 spaces for each indent at the bare minimum. The most widely used within perl modules appears to be 4 (among the code I've seen).

    Alternatively you can use tabs. This has the advantage that you can use one character (a tab) per indent and then anyone else viewing the code can simply set the tab width to whatever they prefer. The choice of tabs vs spaces is a highly personal one, however, so be aware that you won't please everyone (see No Hard Tabs in Code). Do not use tabs for alignment.

    Finally, until you understand what you are doing with indentation (and even then too), consider perltidy

Re: how to get average of matrices' elements?
by toolic (Bishop) on Nov 30, 2016 at 18:58 UTC
    The first time through your nested for loops, @m_avrg is uninitialized. You can uninitialized with:
    my @m_avrg; for my $i ( 0 .. 2 ) { $m_avrg[$i] = [ map { $_ = 0 } 1 .. 3 ]; # $m_avrg[$i] = [ map { $_ => 0 } 1 .. 3 ]; # WRONG! }

    UPDATE: I copy'n'pasted the wrong code originally. Yes, Anon's x operator is better.

    See also: Basic debugging checklist

    Use 4 single spaces for each level of indentation.

      That map is wrong, we're not building a hash. Just use $m_avrg[$i] = [ (0) x 3 ];
        That map is wrong, we're not building a hash.
        Please try running my code before you make such a claim. My code creates an array-of-arrays data structure with all elements initialized to 0.

        UPDATE: see my updated code.

      Also, another way to avoid the "uninitialized variable" warning is to use the += operator.
      instead of this: $m_avrg[$a][$b] = ($m_avrg[$a][$b] + $list[$a][$b]); use this: $m_avrg[$a][$b] += $list[$a][$b];
Re: how to get average of matrices' elements?
by Marshall (Canon) on Dec 01, 2016 at 16:52 UTC
    Re:Indenting.
    I saw the post by Hippo at Re: how to get average of matrices' elements?. A few extra comments...

    How many spaces to use for each indentation level is actually something that has been academically studied. The answer is "3 or 4 spaces". 2 is too few for good readability and 5 winds up taking up more space while not improving readability. 3 or 4 appear to be almost the same. Certainly 1 is too few. The human eye will just get lost.

    To use tabs or not in the code is something that can start a long, very emotional discussion. I personally do not put tabs anywhere in the code or comments.

    If you use an editor that is designed to be used for writing code, there will be special features that make it easy to enforce whatever style you prefer. For example, my editor has an option, "convert tabs to spaces". Without doing something special, I can't wind up with any embedded tab characters.

    There is a fair amount of variability on the "braces style". One way is like you did it. For Perl code, I prefer to put the initial opening brace on its own line. Like this:

    for ($a=0; $a<=2; $a++) { for ($b=0; $b<=2; $b++) { $m_avrg[$a][$b] = ($m_avrg[$a][$b] + $list[$a][$b]); print "$m_avrg[$a][$b] \n"; } }
    I find that easier to read. But again, mileage varies a lot! You can make up your own mind about that.

    For other languages like Java, I use the more vertically compact form because there winds up being a whole mess of little itty bitty "getters and setters". So I am flexible about this point, depending upon the situation.

Re: how to get average of matrices' elements?
by Anonymous Monk on Nov 30, 2016 at 19:03 UTC
    You're dividing m_avrg by 4 once per file. Move the averaging outside the file loop.