in reply to Re: An efficient, scalable matrix transformation algorithm
in thread An efficient, scalable matrix transformation algorithm
Ah, sorry, you're right. It would be O(n*m) (where m is the number of reduction functions, and n is the number of records).
Indeed, the issue is scale. The original data comes in by sampling various parameters of numerous biological electronic sampling devices over time, and so each record keeps a timestamp. The "a" column would, in this case, be a timestamp, and the others would be, say, averages, maxes, minimums for that time window. This needs to be done for a huge number of devices in parallel, or at least in a linear fashion that won't introduce delays, as this has to happen inline with the sampling before it gets archived.
The reason for the transformation is for the purpose of re-scaling the granularity of the data.
A good example would be:
time+microseconds AVGTEMP MAXTEMP MINTEMP STARTTEMP ENDTEMP
In order to rescale this type of data to a larger time granularity (say, 5 second chunks, 1 minute chunks, etc.), you need to perform the different functions on each column to make them reflect the proper averages, maximums, minimums, etc. for the new time window.
(I forgot to mention that I also considered RRD for this, but it doesn't have enough consolidation functions included, and adding new ones is far from trivial. Will update the original node).Ok, that's all verbose. Here's a code sample, as simple as I can make it and still have it work:
#!/usr/bin/perl use strict; use warnings; use List::Util qw( min max sum ); use Data::Dumper; my @funcs = ( undef, # first_timestamp \&avg, \&max, \&min, \&first, \&last, ); my @records = ( # Timestamp.usecs AVGTEMP MAXTEMP MINTEMP STARTTEMP ENDTEMP [ 1234567891.123456, 36.5, 36.9, 36.2, 36.4, 36.6 ], [ 1234567891.654321, 36.6, 40.1, 36.2, 36.6, 36.8 ], [ 1234567893.123456, 36.3, 38.8, 35.1, 36.8, 37.3 ], [ 1234567893.654321, 36.2, 36.9, 36.2, 37.3, 37.1 ], [ 1234567894.123456, 36.8, 37.3, 36.2, 37.1, 37.4 ], ); # Main print " Timestamp AVG MAX MIN START END\n"; print( '[ ', join( ', ', @{ $_ } ), " ]\n" ) for( @records ); print "\n"; my @output = downsample( 5, \@records ); print " Timestamp AVG MAX MIN START END\n"; print '[ ', join( ', ', @output ), " ]\n"; # Subs sub downsample { my( $resolution, $data ) = @_; my $newdata = transpose( $data ); my $first_timestamp = first_timestamp_for_resolution( $resolution, $newdata->[0][0] +); my @output; push( @output, $first_timestamp ); push( @output, $funcs[$_]->( @{$newdata->[$_]} ) ) for ( 1..$#func +s ); return( @output ); } sub first_timestamp_for_resolution { my( $resolution, $value ) = @_; # $resolution in seconds return( int( $value - ( $value % $resolution ) ) ); } # from Math::Matrix sub transpose { my( $matrix ) = shift; my( $m, @result ); for my $col ( @{ $matrix->[0] } ) { push( @result, [] ); } for my $row ( @{$matrix} ) { $m=0; for my $col ( @{$row} ) { push( @{ $result[$m++] }, $col ); } } return( \@result ); } sub first { return( $_[0] ); } sub last { return( $_[-1] ); } sub avg { return( sum( @_ ) / scalar( @_ ) ); }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: An efficient, scalable matrix transformation algorithm
by dHarry (Abbot) on Jan 28, 2009 at 13:42 UTC | |
by Luftkissenboot (Novice) on Jan 28, 2009 at 14:05 UTC | |
by dHarry (Abbot) on Jan 28, 2009 at 14:46 UTC | |
|
Re^3: An efficient, scalable matrix transformation algorithm
by gone2015 (Deacon) on Jan 28, 2009 at 14:20 UTC | |
by gone2015 (Deacon) on Jan 28, 2009 at 18:05 UTC | |
by Luftkissenboot (Novice) on Jan 29, 2009 at 07:37 UTC | |
by Luftkissenboot (Novice) on Jan 28, 2009 at 15:21 UTC |