Averaging Elements in Array of Array

neversaint has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Averaging Elements in Array of Array by ikegami (Patriarch) on Dec 26, 2008 at 10:01 UTC
`my @sum; for my $row (@aoa) { for (@{$aoa[$row]}) { $sum[$col] += $_; } } for (@sum) { $_ /= @aoa; }` [download] Update: Changed `$sum /= @aoa;` to `$_ /= @aoa;`	[reply] [d/l] [select]
Re: Averaging Elements in Array of Array by hda (Chaplain) on Dec 26, 2008 at 13:44 UTC
Hi! Yes, there is a fast way, using PDL (Perl Data language): http://pdl.perl.org/ Namely you can import your data into a piddle (a PDL object), slice the column and then perform "stats" over that: `use PDL; $piddle = pdl @your_array; $a = slice $piddle, '$column,:'; print stats $a;` [download] Where '$column,:' specifies that you want the whole y range (:) of column $column to be dumped into $a. You can, for example iterate over all columns that you want. other more advanced uses are for sure possible, like taking all averages at the same time, but these are beyond my current knowledge. Hope this helps	[reply] [d/l]
Re^2: Averaging Elements in Array of Array by bruno (Friar) on Dec 26, 2008 at 14:41 UTC
I thought so too, but here's the benchmark for it: #!/usr/bin/perl use strict; use warnings; use Benchmark qw/cmpthese/; use PDL::LiteF; my @data = ( [ 0, 3, 2, 1 ], [ 1, 11, 1, 2 ], [ 5, -2, 0, 1 ], ); # It's not fair to make the conversion every time. my $pdldata = pdl @data; sub using_array { my @data = @_; my @sums; for my $i ( 0 .. $#data ) { $sums[0] += $data[$i][0]; $sums[1] += $data[$i][1]; $sums[2] += $data[$i][2]; $sums[3] += $data[$i][3]; } $sums[$_] /= @data for 0 .. 3; return @sums; } sub using_pdl { my $pdldata = shift; $pdldata /= $pdldata->getdim(1); return $pdldata->transpose->sumover; } cmpthese( 100000, { 'Array-based' => sub { using_array(@data) }, 'PDL-based' => sub { using_pdl($pdldata) }, } ); [download] Result: `Rate PDL-based Array-based PDL-based 36496/s -- -67% Array-based 111111/s 204% --` [download] Apparently, for a dataset of this size (3 by 4) it's not worth it to use PDL. The good thing though, is that it can be applied for a bidimensional piddle of an arbitrary size without modifications of the subroutine. I suppose that PDL scales much better though, I've used it for multidimensional piddles of 1e7 elements with a 50-fold increase in speed over a traditional array-based implementation.	[reply] [d/l] [select]
Re^3: Averaging Elements in Array of Array by hda (Chaplain) on Dec 26, 2008 at 14:53 UTC
Bruno, you are completely right: in this case the use PDL is justified when working with large arrays. I just supposed that neversaint's array was just an example and that the real problem was a bit more complicated.	[reply]
Re^3: Averaging Elements in Array of Array by xhunter (Sexton) on Dec 27, 2008 at 21:12 UTC
Building the shoulders of giants here's a more generalized benchmarker. It shows that while the PDL approach is slower for a 5x5 matrix, it quickly becomes the choice for speed of computation as the matrix size grows. For example, given a 30x30 matrix one can average the columns of it using PDL method 7 times faster than with conventional methods. Imagine the gains with dimensions in the hundreds or thousands. #!/usr/bin/perl use strict; use warnings; use Benchmark qw/cmpthese/; use PDL::LiteF; my @number_of_arrays = qw(5 15 30); my @size_of_arrays = qw(5 15 30); my $iterations = 50000; my $max_integer = 100; benchmark_it( \@number_of_arrays, \@size_of_arrays, $max_integer ); #---------------- sub benchmark_it { my $number_of_arrays = shift; my $size_of_arrays = shift; my $max_random_integer = shift; for my $number ( @{$number_of_arrays} ) { for my $size ( @{size_of_arrays} ) { my $data = build_random_array( $number, $size, $max_random_integer +); my $pdldata = pdl $data; print "Results when number of arrays is $number and size of each array is $s +ize:\n"; cmpthese( $iterations, { 'Array-based' => sub { using_array($data) }, 'PDL-based' => sub { using_pdl($pdldata) }, 'Map-based' => sub { using_map($data) }, } ); print "\n"; } } } sub using_array { my $data = shift; my @sums; my $last_row_index = scalar @{$data} - 1; my $last_column_index = scalar @{ $data->[0] } - 1; for my $i ( 0 .. $last_row_index - 1 ) { for my $j ( 0 .. $last_column_index ) { $sums[$j] += $data->[$i][$j]; } # Hard-coded indices run faster. # $sums[0] += $data[$i][0]; # $sums[1] += $data[$i][1]; # $sums[2] += $data[$i][2]; # $sums[3] += $data[$i][3]; } $sums[$_] /= ( $last_row_index + 1 ) for 0 .. $last_column_index; return @sums; } sub using_map { my $data = shift; my $range_max = scalar @{ $data->[0] } - 1; my @sums; map { for my $j ( 0 .. $range_max ) { $sums[$j] += $_->[$j]; } } @{$data}; return \@sums; } sub using_pdl { my $pdldata = shift; $pdldata /= $pdldata->getdim(1); return $pdldata->transpose->sumover; } sub build_random_array { my $number_of_arrays = shift \|\| 10; my $size_of_arrays = shift \|\| 10; my $max_integer = shift \|\| 100; my $data; foreach my $i ( 1 .. $number_of_arrays ) { my @random_array; push @random_array, int rand( $max_integer + 1 ) for ( 1 .. $size_of_arrays ); push @{$data}, \@random_array; } return $data; } __END__ =head1 Synopsis Compare PDL to more conventional methods of finding the average of the + column vectors in a 2D matrix. =head1 Results My results on December 27, 2008 Results when number of arrays is 5 and size of each array is 5: Rate PDL-based Array-based Map-based PDL-based 42017/s -- -50% -57% Array-based 84746/s 102% -- -14% Map-based 98039/s 133% 16% -- </pre> Results when number of arrays is 30 and size of each array is 30: Rate Array-based Map-based PDL-based Array-based 3987/s -- -17% -89% Map-based 4808/s 21% -- -86% PDL-based 35461/s 789% 638% -- =head1 Notes Note when the matrix is small, 5x5, PDL is slower but as the size of t +he matrix grows, PDL becomes smokin hot from it's speed. It's nice to see the recent development activity with PDL. =cut [download]	[reply] [d/l]
Re^4: Averaging Elements in Array of Array by hda (Chaplain) on Dec 28, 2008 at 11:25 UTC
Re: Averaging Elements in Array of Array by JavaFan (Canon) on Dec 26, 2008 at 12:16 UTC
Well, it's trivial to do it in linear time, and obviously, you have to consider all elements, so there isn't much to win here. Except perhaps doing it all in C.	[reply]
Re^2: Averaging Elements in Array of Array by BrowserUk (Patriarch) on Dec 26, 2008 at 12:50 UTC
There are some gains to be had. This'll run in about 1/3rd the time of ikegami's version above for 1e6 elements, and proportionally more as the size increases: `my @sums; for my $i ( 0 .. $#data ) { $sums[ 0 ] += $data[$i][ 0 ]; $sums[ 1 ] += $data[$i][ 1 ]; $sums[ 2 ] += $data[$i][ 2 ]; $sums[ 3 ] += $data[$i][ 3 ]; } $sums[ $_ ] /= @data for 0 .. 3;` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l]
Re^3: Averaging Elements in Array of Array by neversaint (Deacon) on Dec 26, 2008 at 13:30 UTC
Dear BrowserUk, Your method is fast indeed. Is there a way I can modify your code so that in can incorporate any length of sub-array size? (i.e > 4 elements). --- neversaint and everlastingly indebted.......	[reply]
Re^4: Averaging Elements in Array of Array by BrowserUk (Patriarch) on Dec 26, 2008 at 14:06 UTC
Re^5: Averaging Elements in Array of Array by Limbic~Region (Chancellor) on Dec 27, 2008 at 02:01 UTC
Some notes below your chosen depth have not been shown here
Re^4: Averaging Elements in Array of Array by Anonymous Monk on Dec 26, 2008 at 14:09 UTC
Re: Averaging Elements in Array of Array by Anonymous Monk on Dec 27, 2008 at 00:45 UTC
you could try something like this for(my $i=0;$i< scalar(@aoa);$i++) { for(my $x=0;$x < scalar($aoa$i);$x++) { $hash{$i} = $hash{$i} + $aoa$i $x; } } my @res = (); foreach my $key (sort keys %hash) { push(@res,(($hash{$key}) / 4)); } If you don't know the length of the aoa then you could make the '4' in '/ 4' into a var. Or you could do a hash that has more of a struct feel to it like: $hash{$i}{'total'} = <total of the one column of values> $hash{$i}{'array_length'} = # Not sure how much faster this would be than what you have, but it should be fairly fast since you only go through the array of arrays once and then compute the average once. George in NC	[reply]
Re^2: Averaging Elements in Array of Array by ikegami (Patriarch) on Dec 27, 2008 at 01:18 UTC
This is a slower and less readable version of what I posted 15 hours ago. `$aoa[$i]` (in `$x < scalar($aoa[$i])`) is buggy. It should be `@{ $aoa[$i] }` Your code (specifically your use of `sort`) fails when there are more than 10 columns in each row. Why do you hardcode "`4`" in one place but use "`scalar(@aoa)`" in another? That should tell you something's wrong. There's no reason to use a hash here. It's not appropriate, and it just slows things down for nothing. "hash" is an awful name for a variable. `$x = $x + $y;` can be written more concisely as `$x += $y;`. `scalar(@aoa)` can be written more concisely as `@aoa` if the context is already scalar. `for (my $i=0; $i<@aoa; $i++)` can be written more concisely as `for my $i (0..$#aoa)`. Finally, your post formatting is buggy. Put code and other preformatted data in `<c>...</c>` tags.	[reply] [d/l] [select]
Re^3: Averaging Elements in Array of Array by Anonymous Monk on Dec 27, 2008 at 06:44 UTC
I apologize.. it was more of an idea than an actual peice of code. I didn't see it written anywhere in the other peices, so i thought i'd throw it out there. If someone is interested in finding the best way to parse something then it's best to see or try as many options as possible before determining what is best. The code can definately be more concise or made better and no, I wouldn't use the specific code in a program, but it would be similar. The '4' along with any other part that is not to your liking can be modified depending on the needs of the app or desire of the programmer. That's the great thing about perl... there are twenty ways to do something and twenty diff opinions on what is best...	[reply]