in reply to Re: Correlation plots
in thread Correlation plots

Your method requires that all data to be in memory at once, but it's simple to refactor it so that's not the case.

my ($cnt, $sum, $squ); while (my ($d) = $iter->()) { $cnt++; $sum += $d; $squ += $d * $d; } my $mean = $sum / $cnt; my $var = $squ + -2*$mean*$sum + $mean*$mean*$cnt; my $std = sqrt($var/$cnt);

while (my ($d) = $iter->()) can be replaced with any loop, including a file reading loop or a database fetching loop.

Update: If you don't need $var anywhere else, the last two lines can be simplified to

my $std = sqrt($squ/$cnt - $mean*$mean);

Replies are listed 'Best First'.
Re^3: Correlation plots
by jettero (Monsignor) on Oct 09, 2007 at 18:59 UTC

    Sure. But I started by imagining piddles, and I think you'd have to have them all in memory to use that well also. I've used PDL for a grand total of 30 minutes though, so I could be wrong.

    -Paul

      Your code was an alternative to PDL, which means you don't have to copy its problems. If you're going to sacrifice the speed of PDL, you might as well gain a memory advantage.

      Piddles have the speed advantage.
      My code has the memory advantage.
      Your code has the worse of both.

        Concerning, "Your code has the worse of both." It has the advantage of being very easy though. Have you used a computer where putting 10,000*100 things in memory was a problem in the last 20 years?

        The advantage of perl is the simple implementations you can achieve, not its speed. If speed was that much of a problem, why not use C or assembler?

        Don't get me wrong, the incremental approach you show is superior. Definitely. But it's not as easy and easy was the point of my "worse" code.

        Lastly, since you got me in an argumentative and defensive mood: if you want to outperform both perl and piddles, just let the database do the work — which is the point of last part of my post.

        In short: jeese...

        -Paul