in reply to Identifying Outlier Clusters

It quite easy, and relatively efficient to calculate multiple-period moving averages concurrently, requiring just a single pass over the data:

#! perl -slw use strict; use List::Util qw[ sum ]; our $LOW //= 10; our $HIGH //= 100; our $LIMIT //= 0.85; my %seqs = map{ $_ => [] } $LOW .. $HIGH; for my $i ( 1 .. 1e6 ) { my $value = rand 1; for my $period ( $LOW .. $HIGH ) { if( @{ $seqs{ $period } } < $period ) { push @{ $seqs{ $period } }, $value; } else { shift @{ $seqs{ $period } }; push @{ $seqs{ $period } }, $value; my $ave = sum( @{ $seqs{ $period } } ) / @{ $seqs{ $period + } }; if( $ave > $LIMIT ) { printf "Possible $period period cluster (ave:%.3f)". "at %d through %d\n", $ave, $i - $period, $i; } } } } __END__ C:\test>junk81.pl Possible 10 period cluster (ave:0.869)at 16470 through 16480 Possible 10 period cluster (ave:0.855)at 16471 through 16481 Possible 11 period cluster (ave:0.858)at 16470 through 16481 Possible 10 period cluster (ave:0.857)at 101204 through 101214 Possible 10 period cluster (ave:0.852)at 101205 through 101215 Possible 10 period cluster (ave:0.856)at 146170 through 146180 Possible 10 period cluster (ave:0.866)at 211311 through 211321 Possible 10 period cluster (ave:0.864)at 323774 through 323784 Possible 10 period cluster (ave:0.868)at 442882 through 442892 Possible 12 period cluster (ave:0.850)at 452199 through 452211 Possible 12 period cluster (ave:0.851)at 452200 through 452212 Possible 13 period cluster (ave:0.856)at 452199 through 452212 Possible 10 period cluster (ave:0.852)at 557989 through 557999 Possible 10 period cluster (ave:0.863)at 700401 through 700411 Possible 10 period cluster (ave:0.858)at 759150 through 759160 Possible 10 period cluster (ave:0.862)at 759151 through 759161 Possible 11 period cluster (ave:0.864)at 759150 through 759161 Possible 10 period cluster (ave:0.853)at 759152 through 759162 Possible 11 period cluster (ave:0.866)at 759151 through 759162 Possible 12 period cluster (ave:0.868)at 759150 through 759162 Possible 13 period cluster (ave:0.853)at 759149 through 759162 Possible 13 period cluster (ave:0.850)at 759151 through 759164 Possible 14 period cluster (ave:0.852)at 759150 through 759164 Possible 10 period cluster (ave:0.866)at 786191 through 786201 Possible 10 period cluster (ave:0.851)at 805682 through 805692 Possible 10 period cluster (ave:0.864)at 965272 through 965282 Possible 10 period cluster (ave:0.853)at 992279 through 992289 Possible 10 period cluster (ave:0.886)at 992281 through 992291

That took just about 4 minutes to calculate 90 different moving averages over a million data points. There's no IO invloved, but as it only requires a single pass, that's an unavoidable constant anyway.

You can obviously do less and non-consecutive periods.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"I'd rather go naked than blow up my ass"