in reply to Identifying Outlier Clusters
It quite easy, and relatively efficient to calculate multiple-period moving averages concurrently, requiring just a single pass over the data:
#! perl -slw use strict; use List::Util qw[ sum ]; our $LOW //= 10; our $HIGH //= 100; our $LIMIT //= 0.85; my %seqs = map{ $_ => [] } $LOW .. $HIGH; for my $i ( 1 .. 1e6 ) { my $value = rand 1; for my $period ( $LOW .. $HIGH ) { if( @{ $seqs{ $period } } < $period ) { push @{ $seqs{ $period } }, $value; } else { shift @{ $seqs{ $period } }; push @{ $seqs{ $period } }, $value; my $ave = sum( @{ $seqs{ $period } } ) / @{ $seqs{ $period + } }; if( $ave > $LIMIT ) { printf "Possible $period period cluster (ave:%.3f)". "at %d through %d\n", $ave, $i - $period, $i; } } } } __END__ C:\test>junk81.pl Possible 10 period cluster (ave:0.869)at 16470 through 16480 Possible 10 period cluster (ave:0.855)at 16471 through 16481 Possible 11 period cluster (ave:0.858)at 16470 through 16481 Possible 10 period cluster (ave:0.857)at 101204 through 101214 Possible 10 period cluster (ave:0.852)at 101205 through 101215 Possible 10 period cluster (ave:0.856)at 146170 through 146180 Possible 10 period cluster (ave:0.866)at 211311 through 211321 Possible 10 period cluster (ave:0.864)at 323774 through 323784 Possible 10 period cluster (ave:0.868)at 442882 through 442892 Possible 12 period cluster (ave:0.850)at 452199 through 452211 Possible 12 period cluster (ave:0.851)at 452200 through 452212 Possible 13 period cluster (ave:0.856)at 452199 through 452212 Possible 10 period cluster (ave:0.852)at 557989 through 557999 Possible 10 period cluster (ave:0.863)at 700401 through 700411 Possible 10 period cluster (ave:0.858)at 759150 through 759160 Possible 10 period cluster (ave:0.862)at 759151 through 759161 Possible 11 period cluster (ave:0.864)at 759150 through 759161 Possible 10 period cluster (ave:0.853)at 759152 through 759162 Possible 11 period cluster (ave:0.866)at 759151 through 759162 Possible 12 period cluster (ave:0.868)at 759150 through 759162 Possible 13 period cluster (ave:0.853)at 759149 through 759162 Possible 13 period cluster (ave:0.850)at 759151 through 759164 Possible 14 period cluster (ave:0.852)at 759150 through 759164 Possible 10 period cluster (ave:0.866)at 786191 through 786201 Possible 10 period cluster (ave:0.851)at 805682 through 805692 Possible 10 period cluster (ave:0.864)at 965272 through 965282 Possible 10 period cluster (ave:0.853)at 992279 through 992289 Possible 10 period cluster (ave:0.886)at 992281 through 992291
That took just about 4 minutes to calculate 90 different moving averages over a million data points. There's no IO invloved, but as it only requires a single pass, that's an unavoidable constant anyway.
You can obviously do less and non-consecutive periods.
|
|---|