Calculate hourly averages

addyrocker has asked for the wisdom of the Perl Monks concerning the following question:

Dear offerers of Perl Wisdom I have data with the following output

 
itemid  FROM_UNIXTIME(clock)       value
26603   2010-10-09 00:00:23     38784360.8136
26603   2010-10-09 00:01:23     36529742.6667
26603   2010-10-09 00:02:23     36966880.8000
26603   2010-10-09 00:03:23     35666405.8667
26603   2010-10-09 01:05:23     35600190.6667
26603   2010-10-09 02:06:23     39175029.2000
26603   2010-10-09 03:07:23     35386478.4000
26603   2010-10-09 04:08:27     35773482.7500
26603   2010-10-09 05:09:23     39073367.5714
26603   2010-10-09 06:10:23     34379049.8667
26603   2010-10-09 07:11:24     33984045.7705
26603   2010-10-09 09:12:23     36276301.8305
26603   2010-10-09 10:13:23     37248366.0000
up to time 23:59
26603   2010-10-09 23:59:23     39393421.3333
[download]

What I'm trying to do is count the number of readings/occurrences for each hour and divide that value to the sum of values for that hour. I've so far been only able to get the occurrences for each hour with the following code

#!/usr/bin/perl
        my $count={};
        while (<>) {
                chop;
                @F = split;
                @J=split(/\:/,$F[2]);
                if ( $J[0] =~ /[0-9][0-9]/ ){ $count{$J[0]}++};
        }
        foreach $key(sort(keys(%count))){
            print "$key\t$count{$key}\n";
        }
[download]

But I'm not able to get the hourly sum for each hour, I've tried different things like util sum and other code. If anyone can suggest a better way of summing up values based on hours I'll be really grateful

Comment on Calculate hourly averages Select or Download Code

Replies are listed 'Best First'.
Re: Calculate hourly averages by roboticus (Chancellor) on Oct 15, 2010 at 15:15 UTC
addyrocker: You're close: `#!/usr/bin/perl my $count={}; my $total={}; while (<>) { chop; @F = split; @J=split(/\:/,$F[2]); if ( $J[0] =~ /[0-9][0-9]/ ){ $count{$J[0]}++; $total{$J[0]} += $F[3]; } } foreach $key(sort(keys(%count))){ print "$key\t$count{$key}\t$total{$key}\n"; }` [download] Note: Untested.... ...roboticus	[reply] [d/l]
Re: Calculate hourly averages by toolic (Bishop) on Oct 15, 2010 at 15:19 UTC
use strict; use warnings; use List::Util qw(sum); my %data; while (<DATA>) { my ($time, $value) = (split)[2..3]; next unless $time =~ /:/; my $hour = (split /:/, $time)[0]; push @{ $data{$hour} }, $value; } for my $hour (sort keys %data) { my @vals = @{ $data{$hour} }; print "Hour=$hour, Count=", scalar @vals, " Avg=", sum(@vals)/@val +s, "\n"; } __DATA__ itemid FROM_UNIXTIME(clock) value 26603 2010-10-09 00:00:23 38784360.8136 26603 2010-10-09 00:01:23 36529742.6667 26603 2010-10-09 00:02:23 36966880.8000 26603 2010-10-09 00:03:23 35666405.8667 26603 2010-10-09 01:05:23 35600190.6667 26603 2010-10-09 02:06:23 39175029.2000 26603 2010-10-09 03:07:23 35386478.4000 26603 2010-10-09 04:08:27 35773482.7500 26603 2010-10-09 05:09:23 39073367.5714 26603 2010-10-09 06:10:23 34379049.8667 26603 2010-10-09 07:11:24 33984045.7705 26603 2010-10-09 09:12:23 36276301.8305 26603 2010-10-09 10:13:23 37248366.0000 up to time 23:59 26603 2010-10-09 23:59:23 39393421.3333 [download] prints: `Hour=00, Count=4 Avg=36986847.53675 Hour=01, Count=1 Avg=35600190.6667 Hour=02, Count=1 Avg=39175029.2 Hour=03, Count=1 Avg=35386478.4 Hour=04, Count=1 Avg=35773482.75 Hour=05, Count=1 Avg=39073367.5714 Hour=06, Count=1 Avg=34379049.8667 Hour=07, Count=1 Avg=33984045.7705 Hour=09, Count=1 Avg=36276301.8305 Hour=10, Count=1 Avg=37248366 Hour=23, Count=1 Avg=39393421.3333` [download]	[reply] [d/l] [select]
Re: Calculate hourly averages by dwm042 (Priest) on Oct 15, 2010 at 15:23 UTC
Is this more what you were thinking of? #!/usr/bin/perl use warnings; use strict; my %hours; while(<DATA>) { next unless $_ =~ /\w+/; my @F = split; if ( $F[2] =~ /(\d{2})\:\d{2}:\d{2}/ ) { $hours{$1}{value} += $F[3]; $hours{$1}{count}++; } } for ( sort { $a <=> $b } keys %hours ) { print "Sum at $_ hours is ", $hours{$_}{value}, "\n"; print "Count at $_ hours is ", $hours{$_}{count}, "\n"; } __DATA__ 26603 2010-10-09 00:00:23 38784360.8136 26603 2010-10-09 00:01:23 36529742.6667 26603 2010-10-09 00:02:23 36966880.8000 26603 2010-10-09 00:03:23 35666405.8667 26603 2010-10-09 01:05:23 35600190.6667 26603 2010-10-09 02:06:23 39175029.2000 26603 2010-10-09 03:07:23 35386478.4000 26603 2010-10-09 04:08:27 35773482.7500 26603 2010-10-09 05:09:23 39073367.5714 26603 2010-10-09 06:10:23 34379049.8667 26603 2010-10-09 07:11:24 33984045.7705 26603 2010-10-09 09:12:23 36276301.8305 26603 2010-10-09 10:13:23 37248366.0000 [download] which yields the results: C:\Code>perl values_per_hour.pl Sum at 00 hours is 147947390.147 Count at 00 hours is 4 Sum at 01 hours is 35600190.6667 Count at 01 hours is 1 Sum at 02 hours is 39175029.2 Count at 02 hours is 1 Sum at 03 hours is 35386478.4 Count at 03 hours is 1 Sum at 04 hours is 35773482.75 Count at 04 hours is 1 Sum at 05 hours is 39073367.5714 Count at 05 hours is 1 Sum at 06 hours is 34379049.8667 Count at 06 hours is 1 Sum at 07 hours is 33984045.7705 Count at 07 hours is 1 Sum at 09 hours is 36276301.8305 Count at 09 hours is 1 Sum at 10 hours is 37248366 Count at 10 hours is 1 C:\Code> [download]	[reply] [d/l] [select]
Re: Calculate hourly averages by moritz (Cardinal) on Oct 15, 2010 at 15:28 UTC
If you want averages over the hours, you have to use the hour as a hash key: #!/usr/bin/perl use strict; use warnings; my %count; my %sum; while (<DATA>) { chomp; next if /^itemid/; my @F = split; my @J = split(/\:/,$F[2]); my $hour = $J[0]; if ( $J[0] =~ /[0-9][0-9]/ ){ $count{$hour}++; $sum{$hour} += $F[-1]; }; } for (sort keys %count ) { print "$_ ", $sum{$_} / $count{$_}, "\n"; } __DATA__ itemid FROM_UNIXTIME(clock) value 26603 2010-10-09 00:00:23 38784360.8136 26603 2010-10-09 00:01:23 36529742.6667 26603 2010-10-09 00:02:23 36966880.8000 26603 2010-10-09 00:03:23 35666405.8667 26603 2010-10-09 01:05:23 35600190.6667 26603 2010-10-09 02:06:23 39175029.2000 26603 2010-10-09 03:07:23 35386478.4000 26603 2010-10-09 04:08:27 35773482.7500 26603 2010-10-09 05:09:23 39073367.5714 26603 2010-10-09 06:10:23 34379049.8667 26603 2010-10-09 07:11:24 33984045.7705 26603 2010-10-09 09:12:23 36276301.8305 26603 2010-10-09 10:13:23 37248366.0000 [download] If the logfile entries are always sorted by hour, you don't even need the hash: `use strict; use warnings; my $count = 0; my $sum = 0; my $previous_hour = -1; while (<DATA>) { chomp; next if /^itemid/; my @F = split; my @J = split(/\:/,$F[2]); my $hour = $J[0]; if ($hour != $previous_hour && $count) { print $previous_hour, " ", $sum / $count, "\n"; $sum = 0; $count = 0; } $sum += $F[-1]; $count++; $previous_hour = $hour; }` [download] Perl 6 - links to (nearly) everything that is Perl 6.	[reply] [d/l] [select]
Re: Calculate hourly averages by locked_user sundialsvc4 (Abbot) on Oct 15, 2010 at 17:04 UTC
It also occurs to me that you could stuff this into an SQL table and `GROUP BY` an expression that discards minutes. Or, you could write a stub Perl script that pre-processes the data for such a purpose. I make this suggestion on the basis that, “you might wish to get to SQL-land as soon as you can, for other reasons.”
Re^2: Calculate hourly averages by addyrocker (Initiate) on Oct 20, 2010 at 13:02 UTC
Thank you all for your valuable input it has really given me a lot to think about especially in terms of the different ways it can be done, will also give the SQL way a try thanks again!	[reply]
Re: Calculate hourly averages by Anonymous Monk on Oct 15, 2010 at 15:15 UTC
`my $re = qr/ ( \d+ ) # $1 id \s+ \d{4}-\d{2}-\d{2}\s ( \d{2} ) :\d{2}:\d{2} # $2 hour \s+ (\S+) # $3 val /x; while(<>){ if( /$re/ ){ my( $id, $hour, $value ) = ( $1 , $2, $3 ); ...` [download]	[reply] [d/l]