Earindil has asked for the wisdom of the Perl Monks concerning the following question:

All Knowing Monks,
I would like to rearrange this data so that it looks like this:

timestamp min avg max errorcode

The source data is always going to be in 5 minute increments but there can be anywhere from 0 to X number of elements per 5 minute increment. I'm trying to do this in order to build some candlestick graphs and would appreciate any assistance.
04/26/04.16:15 3.088 N 04/26/04.16:15 5.994 N 04/26/04.16:15 4.391 N 04/26/04.16:15 3.616 N 04/26/04.16:20 3.997 N 04/26/04.16:20 4.151 N 04/26/04.16:20 3.751 N 04/26/04.16:20 2.887 N 04/26/04.16:25 3.657 N 04/26/04.16:25 4.712 N 04/26/04.16:25 4.346 N 04/26/04.16:30 3.628 N 04/26/04.16:30 5.636 N 04/26/04.16:30 4.593 N

Replies are listed 'Best First'.
Re: rearrange data to min, avg, max
by kvale (Monsignor) on Apr 30, 2004 at 16:00 UTC
    Try this:
    use warnings; use strict; use List::Util qw(max min sum); my %range; while (<DATA>) { my ($time, $value, $error) = split /\s+/; push @{ $range{$time} }, $value; } foreach my $time (sort keys %range) { print "$time "; print min( @{ $range{$time} } ) . " "; print sum( @{ $range{$time} } )/@{ $range{$time} } . " "; print max( @{ $range{$time} } ) . "\n"; } __DATA__ 04/26/04.16:15 3.088 N 04/26/04.16:15 5.994 N 04/26/04.16:15 4.391 N 04/26/04.16:15 3.616 N 04/26/04.16:20 3.997 N 04/26/04.16:20 4.151 N 04/26/04.16:20 3.751 N 04/26/04.16:20 2.887 N 04/26/04.16:25 3.657 N 04/26/04.16:25 4.712 N 04/26/04.16:25 4.346 N 04/26/04.16:30 3.628 N 04/26/04.16:30 5.636 N 04/26/04.16:30 4.593 N
    Here, I have ignored the error code because I did not know what to do in case of conflict between readings.

    -Mark

Re: rearrange data to min, avg, max
by Roy Johnson (Monsignor) on Apr 30, 2004 at 15:51 UTC
    So you want to read in all the data for one timestamp, calculate the min, avg, max, and note the errorcode(s), and output the one-line summary, right? Something like (formerly untested but now updated and tested thanks to comments from Limbic~Region):
    my $current_ts = ''; my $rowcount = 0; my ($min, $max, $sum, %ec); while (<>) { chomp; my ($ts, $v, $e) = split; if ($current_ts ne $ts) { if ($current_ts eq '') { $current_ts = $ts } else { &output_summary } $current_ts = $ts; ($min, $max, $sum) = ($v, $v, $v); %ec = ($e, 1); $rowcount = 1; } else { $min = $v if ($v < $min); $max = $v if ($v > $max); $sum += $v; ++$rowcount; ++$ec{$e}; } } &output_summary; sub output_summary { printf "%s\t%g\t%g\t%g\t%s\n", $current_ts, $min, $max, $sum/$rowcou +nt, join(',', keys %ec); }

    The PerlMonk tr/// Advocate
      Roy Johnson,
      I understand it is un-tested, so some nits:
      • When you call the output_summary sub, you need to use () as a suffix or & as a prefix unless it has been previously seen.
      • You forgot to chomp newlines
      • ++$rowcount not +$rowcount
      • ++$ec{$e} not ++%ec{$e}
      • $min and $max need to have initial values prior to doing numerical comparisons
      Finally, I don't think it is working correctly - shouldn't the max for :20 be 4.151?.
      Here is my stab at it:
      #!/usr/bin/perl use strict; use warnings; my %data; while ( <DATA> ) { chomp; my ($stamp, $reading) = split; next if ! $stamp || ! defined $reading; $data{$stamp}{COUNT}++; $data{$stamp}{MAX} = $reading if ! defined $data{$stamp}{MAX} || $ +reading > $data{$stamp}{MAX}; $data{$stamp}{MIN} = $reading if ! defined $data{$stamp}{MIN} || $ +reading < $data{$stamp}{MIN}; $data{$stamp}{SUM} += $reading; } print join "\t" , "TIMESTAMP", "MAX", "MIN", "AVG"; print "\n"; for my $stamp ( sort keys %data ) { print join "\t", $stamp, @{$data{$stamp}}{ qw(MAX MIN) }, $data{$s +tamp}{SUM} / $data{$stamp}{COUNT}; print "\n"; }

      L~R

Re: rearrange data to min, avg, max
by cyocum (Curate) on Apr 30, 2004 at 16:07 UTC

    Based on the data you have shown. I wrote the code below. However, please be warned that I wrote this off the top of my head and it may not work or even do what I purport it to do.

    use strict; use warnings; my @array; my @values; my $avg; while(my $line = <DATA>) { push @array, $line; } foreach my $line2 (@array) { my ($time, $val, $err) = split / /, $line2; push @values, $val; my $tot_val += $val; $avg = $tot_val / $#array; } my @sort_max = sort {$a <=> $b} @values; my @sort_min = soft {$b <=> $a} @values; my $max_val = pop @sort_max; my $min_val = pop @sort_min; foreach my $line3 (@array) { my ($time, $val, $err) = split / /, $line3; if($val == $max_val) { print "max: $line3" . "\n"; } elsif($val == $min_val) { print "min: $line3" . "\n"; } } print "avg: $avg" . "\m"; __DATA__ 04/26/04.16:15 3.088 N 04/26/04.16:15 5.994 N 04/26/04.16:15 4.391 N 04/26/04.16:15 3.616 N 04/26/04.16:20 3.997 N 04/26/04.16:20 4.151 N 04/26/04.16:20 3.751 N 04/26/04.16:20 2.887 N 04/26/04.16:25 3.657 N 04/26/04.16:25 4.712 N 04/26/04.16:25 4.346 N 04/26/04.16:30 3.628 N 04/26/04.16:30 5.636 N 04/26/04.16:30 4.593 N

Re: rearrange data to min, avg, max
by Earindil (Beadle) on Apr 30, 2004 at 17:06 UTC
    Due to my inability to wrap my head around arrays of hashes, I had ended up doing it like this:
    $data_min{"$timestamp"} = $seconds if (!$data_min{"$timestamp" +}); $data_min{"$timestamp"} = $seconds if ($seconds<$data_min{"$ti +mestamp"}); $data_max{"$timestamp"} = $seconds if ($seconds>$data_max{"$ti +mestamp"}); $data_avg{"$timestamp"}+= $seconds; $data_counter{"$timestamp"} ++;
    Then I simply looped through the keys of $data_counter and computed averages by dividing $data_avg/$data_counter I'll definetly look at all the suggestions and make use of them. Thanks much.