in reply to Poor performances with threads
It doesn't help that you are making so many copies of your data and results:
sub do_work { my @working_block = @input_data[$start .. $stop]; ## COPY 1 my @partial_output_data; $, = undef; $\ = undef; my $output_line = 0; for (my $i = 0; $i < @working_block; $i += $files) { # calculates the stddev of each slice and outputs the formatte +d resuls if (max(@working_block[$i .. $i + $files - 1]) == 0) { ## CO +PY 2 $partial_output_data[$output_line] = '0.00000E+00'; } else { $partial_output_data[$output_line] = sprintf('%.5E', stddev(@working_block[$i .. $i + $file +s - 1]) / ## COPY 3 mean(@working_block[$i .. $i + $files - 1])); ## COP +Y 4 } $output_line++; } @ReturnData[$start / $files .. $stop / $files] = @partial_output_d +ata; ## COPY 5 return; }
And three of those copies are as lists onto the stack, which then get copied again once and often twice into data structures internal to Statistics::Basic.
I suspect, but cannot verify without trying to re-create your script, (why not post the whole thing if you want help?), that you are spending most of your time allocating memory and copying data, rather than calculating.
Here is a simple script that calculates the coefficient of variation of 55k 7-element datasets in just over half a second:
#! perl -slw use strict; use Time::HiRes qw[ time ]; use Data::Dump qw[ pp ]; use List::Util qw[ sum ]; sub coffOfVar { my $mean = sum( @_ ) / @_; return 0 unless $mean; my @dif2s = map{ ( $_ - $mean )**2 } @_; my $stddev = sum( @dif2s ) / $#_; return $stddev / $mean } my @data; push @data, [ map rand( 2000 )-1000, 1..7 ] for 1 .. 55*1024; my $start = time; my @results; push @results, sprintf '%.5E', coffOfVar( @$_ ) for @data; printf "Took %.2f seconds to find the coffOfVar of %d datasets\n", time - $start, scalar @data; __END__ C:\test>802855.pl Took 0.59 seconds to find the coffOfVar of 56320 datasets
Re-using other peoples code is all well and good, but when it leads to you writing and supporting far more complex code yourself in order to meet your operational performance requirements, because you are constantly having to restructure the natural state of your data to force-fit it to those of the modules you use--and you have no choice but to inherit a bunch of functionality, or just prepreparation for that functionality that you do not require--then it is time to consider extracting just that code that you actually need from the module and make your life simpler.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Poor performances with threads
by moritz (Cardinal) on Oct 23, 2009 at 11:50 UTC | |
by BrowserUk (Patriarch) on Oct 23, 2009 at 12:06 UTC | |
|
Re^2: Poor performances with threads
by olafmar (Novice) on Oct 26, 2009 at 10:01 UTC | |
by olafmar (Novice) on Oct 26, 2009 at 12:18 UTC |