in reply to std dev calculations slow over time

Alternatively, if you choose to use a database, you can take advantage of MySQL's (or Oracle's) DEV (or STDDEV) operator directly.

However, this solution scales less well than BrowserUK's pure perl solution.

mysql 4.1 on linux 2.6 with perl 5.8.5

#!/usr/bin/perl -slw use strict; use Benchmark::Timer; use List::Util qw[ sum ]; use constant { ID => 0, DISCHARGE => 1, PRECIPITATION => 2 }; use DBI; my $dbh = DBI->connect ('DBI:mysql:test', 'user', 'pass'); my $sth = $dbh->prepare ('drop table if exists hydro;'); $sth->execute (); my $sql=<<EOL; CREATE TABLE hydro ( hydro_id int unsigned primary key, discharge integer not null, precipitation integer not null) EOL $sth = $dbh->prepare ($sql); $sth->execute (); sub stdDev { my $ave = 0; $ave += $_ for @_; $ave /= @_; my $variance = sum( map +($_ - $ave)**2, @_ ) / @_; return sqrt $variance; } our $G ||= 100; our $N ||= 1e3; my $T = new Benchmark::Timer; $sth = $dbh->prepare ('insert into hydro values(?,?,?)'); for (1 .. $N) { die ("insert error [$N]") unless $sth->execute ($_, rand 100, rand 1 +0); } my $label = "Moving Std Dev: $G of $N items"; $T->start( $label ); $G--; my @stdDevs = map [], 1 .. $N - $G; $sth = $dbh->prepare ('select stddev(discharge), stddev(precipitation) + from hydro where hydro_id>=? and hydro_id<?'); die ("sth undef") unless ($sth); for my $item ( 0 .. $N - $G ) { die ("execute error [$item]") unless $sth->execute ($item, $item+100 +); my ($sdd, $sdp) = $sth->fetchrow_array(); @{ $stdDevs[ $item ] } = ( $sdd, $sdp ); } $T->stop( $label ); print $T->report; #printf "id:$_ StdDev(Discharge):$stdDevs[ $_ ][ 0 ] StdDev(Precipitat +ion):$stdDevs[ $_ ][ 1 ]\n" for 0 .. $#stdDevs; __END__ =pod =head1 perl $ ./579883.pl -N=1e3 1 trial of Moving Std Dev: 100 of 1e3 items (303.168ms total) $ ./579883.pl -N=1e4 1 trial of Moving Std Dev: 100 of 1e4 items (3.355s total) $ ./579883.pl -N=1e5 1 trial of Moving Std Dev: 100 of 1e5 items (41.562s total) $ ./579883.pl -N=1e6 1 trial of Moving Std Dev: 100 of 1e6 items (340.030s total) =head1 mysql $ ./579883.mysql.pl -N=1e3 1 trial of Moving Std Dev: 100 of 1e3 items (401.692ms total) $ ./579883.mysql.pl -N=1e4 1 trial of Moving Std Dev: 100 of 1e4 items (4.405s total) $ ./579883.mysql.pl -N=1e5 1 trial of Moving Std Dev: 100 of 1e5 items (46.765s total) $ ./579883.mysql.pl -N=1e6 1 trial of Moving Std Dev: 100 of 1e6 items (516.869s total) =cut
--
Andreas

Replies are listed 'Best First'.
Re^2: std dev calculations slow over time
by BrowserUk (Patriarch) on Oct 24, 2006 at 04:43 UTC

    Nice++.

    How refreshing to see some working DB code posted and benchmarked rather than just alluded to.

    As for the scaling, the pure perl only wins whilst the dataset will fit in memory...then your DB appproach wins hands down without resorting to messy overlapping reads :)


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.