Re: Correlation plots

It's probably worth learning PDL. That seems to be the cool thing. If you have real data. I couldn't figure it out for lack of something to do with it.

On the other hand, it's not that hard to do these by hand. Particularly in perl.

  use strict;
  use List::Util qw(sum);
  my @d    = ( 1 .. 10_000 );
  my $s    = sum @d;
  my $mean = $s/@d;
  my $var  = (sum map { ($_-$mean)**2 } @d);
  my $std  = sqrt($var/@d);

# etc...
[download]

... The more I think about it though, if you're pulling these from a database, you don't really even need to do the stddev by hand. I imagine your database of choice has a stddev() built in. The co-varience would probably have to be calculated by hand though. Maybe "select sum( (cola - avg(cola))*(colb - avg(colb))/count(cola) ) from tablename" ... or something like that.

-Paul

Comment on Re: Correlation plots Select or Download Code

Replies are listed 'Best First'.
Re^2: Correlation plots by ikegami (Patriarch) on Oct 09, 2007 at 18:37 UTC
Your method requires that all data to be in memory at once, but it's simple to refactor it so that's not the case. `my ($cnt, $sum, $squ); while (my ($d) = $iter->()) { $cnt++; $sum += $d; $squ += $d * $d; } my $mean = $sum / $cnt; my $var = $squ + -2$mean$sum + $mean$mean$cnt; my $std = sqrt($var/$cnt);` [download] `while (my ($d) = $iter->())` can be replaced with any loop, including a file reading loop or a database fetching loop. Update: If you don't need `$var` anywhere else, the last two lines can be simplified to `my $std = sqrt($squ/$cnt - $mean*$mean);` [download]	[reply] [d/l] [select]
Re^3: Correlation plots by jettero (Monsignor) on Oct 09, 2007 at 18:59 UTC
Sure. But I started by imagining piddles, and I think you'd have to have them all in memory to use that well also. I've used PDL for a grand total of 30 minutes though, so I could be wrong. -Paul	[reply]
Re^4: Correlation plots by ikegami (Patriarch) on Oct 09, 2007 at 19:08 UTC
Your code was an alternative to PDL, which means you don't have to copy its problems. If you're going to sacrifice the speed of PDL, you might as well gain a memory advantage. Piddles have the speed advantage. My code has the memory advantage. Your code has the worse of both.	[reply]
Re^5: Correlation plots by jettero (Monsignor) on Oct 11, 2007 at 11:20 UTC