in reply to Correlation plots

It's probably worth learning PDL. That seems to be the cool thing. If you have real data. I couldn't figure it out for lack of something to do with it.

On the other hand, it's not that hard to do these by hand. Particularly in perl.

use strict; use List::Util qw(sum); my @d = ( 1 .. 10_000 ); my $s = sum @d; my $mean = $s/@d; my $var = (sum map { ($_-$mean)**2 } @d); my $std = sqrt($var/@d); # etc...

... The more I think about it though, if you're pulling these from a database, you don't really even need to do the stddev by hand. I imagine your database of choice has a stddev() built in. The co-varience would probably have to be calculated by hand though. Maybe "select sum( (cola - avg(cola))*(colb - avg(colb))/count(cola) ) from tablename" ... or something like that.

-Paul

Replies are listed 'Best First'.
Re^2: Correlation plots
by ikegami (Patriarch) on Oct 09, 2007 at 18:37 UTC

    Your method requires that all data to be in memory at once, but it's simple to refactor it so that's not the case.

    my ($cnt, $sum, $squ); while (my ($d) = $iter->()) { $cnt++; $sum += $d; $squ += $d * $d; } my $mean = $sum / $cnt; my $var = $squ + -2*$mean*$sum + $mean*$mean*$cnt; my $std = sqrt($var/$cnt);

    while (my ($d) = $iter->()) can be replaced with any loop, including a file reading loop or a database fetching loop.

    Update: If you don't need $var anywhere else, the last two lines can be simplified to

    my $std = sqrt($squ/$cnt - $mean*$mean);

      Sure. But I started by imagining piddles, and I think you'd have to have them all in memory to use that well also. I've used PDL for a grand total of 30 minutes though, so I could be wrong.

      -Paul

        Your code was an alternative to PDL, which means you don't have to copy its problems. If you're going to sacrifice the speed of PDL, you might as well gain a memory advantage.

        Piddles have the speed advantage.
        My code has the memory advantage.
        Your code has the worse of both.