addyrocker has asked for the wisdom of the Perl Monks concerning the following question:

Dear offerers of Perl Wisdom I have data with the following output

itemid FROM_UNIXTIME(clock) value 26603 2010-10-09 00:00:23 38784360.8136 26603 2010-10-09 00:01:23 36529742.6667 26603 2010-10-09 00:02:23 36966880.8000 26603 2010-10-09 00:03:23 35666405.8667 26603 2010-10-09 01:05:23 35600190.6667 26603 2010-10-09 02:06:23 39175029.2000 26603 2010-10-09 03:07:23 35386478.4000 26603 2010-10-09 04:08:27 35773482.7500 26603 2010-10-09 05:09:23 39073367.5714 26603 2010-10-09 06:10:23 34379049.8667 26603 2010-10-09 07:11:24 33984045.7705 26603 2010-10-09 09:12:23 36276301.8305 26603 2010-10-09 10:13:23 37248366.0000 up to time 23:59 26603 2010-10-09 23:59:23 39393421.3333

What I'm trying to do is count the number of readings/occurrences for each hour and divide that value to the sum of values for that hour. I've so far been only able to get the occurrences for each hour with the following code

#!/usr/bin/perl my $count={}; while (<>) { chop; @F = split; @J=split(/\:/,$F[2]); if ( $J[0] =~ /[0-9][0-9]/ ){ $count{$J[0]}++}; } foreach $key(sort(keys(%count))){ print "$key\t$count{$key}\n"; }

But I'm not able to get the hourly sum for each hour, I've tried different things like util sum and other code. If anyone can suggest a better way of summing up values based on hours I'll be really grateful

Replies are listed 'Best First'.
Re: Calculate hourly averages
by roboticus (Chancellor) on Oct 15, 2010 at 15:15 UTC

    addyrocker:

    You're close:

    #!/usr/bin/perl my $count={}; my $total={}; while (<>) { chop; @F = split; @J=split(/\:/,$F[2]); if ( $J[0] =~ /[0-9][0-9]/ ){ $count{$J[0]}++; $total{$J[0]} += $F[3]; } } foreach $key(sort(keys(%count))){ print "$key\t$count{$key}\t$total{$key}\n"; }

    Note: Untested....

    ...roboticus

Re: Calculate hourly averages
by toolic (Bishop) on Oct 15, 2010 at 15:19 UTC
    use strict; use warnings; use List::Util qw(sum); my %data; while (<DATA>) { my ($time, $value) = (split)[2..3]; next unless $time =~ /:/; my $hour = (split /:/, $time)[0]; push @{ $data{$hour} }, $value; } for my $hour (sort keys %data) { my @vals = @{ $data{$hour} }; print "Hour=$hour, Count=", scalar @vals, " Avg=", sum(@vals)/@val +s, "\n"; } __DATA__ itemid FROM_UNIXTIME(clock) value 26603 2010-10-09 00:00:23 38784360.8136 26603 2010-10-09 00:01:23 36529742.6667 26603 2010-10-09 00:02:23 36966880.8000 26603 2010-10-09 00:03:23 35666405.8667 26603 2010-10-09 01:05:23 35600190.6667 26603 2010-10-09 02:06:23 39175029.2000 26603 2010-10-09 03:07:23 35386478.4000 26603 2010-10-09 04:08:27 35773482.7500 26603 2010-10-09 05:09:23 39073367.5714 26603 2010-10-09 06:10:23 34379049.8667 26603 2010-10-09 07:11:24 33984045.7705 26603 2010-10-09 09:12:23 36276301.8305 26603 2010-10-09 10:13:23 37248366.0000 up to time 23:59 26603 2010-10-09 23:59:23 39393421.3333

    prints:

    Hour=00, Count=4 Avg=36986847.53675 Hour=01, Count=1 Avg=35600190.6667 Hour=02, Count=1 Avg=39175029.2 Hour=03, Count=1 Avg=35386478.4 Hour=04, Count=1 Avg=35773482.75 Hour=05, Count=1 Avg=39073367.5714 Hour=06, Count=1 Avg=34379049.8667 Hour=07, Count=1 Avg=33984045.7705 Hour=09, Count=1 Avg=36276301.8305 Hour=10, Count=1 Avg=37248366 Hour=23, Count=1 Avg=39393421.3333
Re: Calculate hourly averages
by dwm042 (Priest) on Oct 15, 2010 at 15:23 UTC
    Is this more what you were thinking of?

    #!/usr/bin/perl use warnings; use strict; my %hours; while(<DATA>) { next unless $_ =~ /\w+/; my @F = split; if ( $F[2] =~ /(\d{2})\:\d{2}:\d{2}/ ) { $hours{$1}{value} += $F[3]; $hours{$1}{count}++; } } for ( sort { $a <=> $b } keys %hours ) { print "Sum at $_ hours is ", $hours{$_}{value}, "\n"; print "Count at $_ hours is ", $hours{$_}{count}, "\n"; } __DATA__ 26603 2010-10-09 00:00:23 38784360.8136 26603 2010-10-09 00:01:23 36529742.6667 26603 2010-10-09 00:02:23 36966880.8000 26603 2010-10-09 00:03:23 35666405.8667 26603 2010-10-09 01:05:23 35600190.6667 26603 2010-10-09 02:06:23 39175029.2000 26603 2010-10-09 03:07:23 35386478.4000 26603 2010-10-09 04:08:27 35773482.7500 26603 2010-10-09 05:09:23 39073367.5714 26603 2010-10-09 06:10:23 34379049.8667 26603 2010-10-09 07:11:24 33984045.7705 26603 2010-10-09 09:12:23 36276301.8305 26603 2010-10-09 10:13:23 37248366.0000
    which yields the results:

    C:\Code>perl values_per_hour.pl Sum at 00 hours is 147947390.147 Count at 00 hours is 4 Sum at 01 hours is 35600190.6667 Count at 01 hours is 1 Sum at 02 hours is 39175029.2 Count at 02 hours is 1 Sum at 03 hours is 35386478.4 Count at 03 hours is 1 Sum at 04 hours is 35773482.75 Count at 04 hours is 1 Sum at 05 hours is 39073367.5714 Count at 05 hours is 1 Sum at 06 hours is 34379049.8667 Count at 06 hours is 1 Sum at 07 hours is 33984045.7705 Count at 07 hours is 1 Sum at 09 hours is 36276301.8305 Count at 09 hours is 1 Sum at 10 hours is 37248366 Count at 10 hours is 1 C:\Code>
Re: Calculate hourly averages
by moritz (Cardinal) on Oct 15, 2010 at 15:28 UTC
    If you want averages over the hours, you have to use the hour as a hash key:
    #!/usr/bin/perl use strict; use warnings; my %count; my %sum; while (<DATA>) { chomp; next if /^itemid/; my @F = split; my @J = split(/\:/,$F[2]); my $hour = $J[0]; if ( $J[0] =~ /[0-9][0-9]/ ){ $count{$hour}++; $sum{$hour} += $F[-1]; }; } for (sort keys %count ) { print "$_ ", $sum{$_} / $count{$_}, "\n"; } __DATA__ itemid FROM_UNIXTIME(clock) value 26603 2010-10-09 00:00:23 38784360.8136 26603 2010-10-09 00:01:23 36529742.6667 26603 2010-10-09 00:02:23 36966880.8000 26603 2010-10-09 00:03:23 35666405.8667 26603 2010-10-09 01:05:23 35600190.6667 26603 2010-10-09 02:06:23 39175029.2000 26603 2010-10-09 03:07:23 35386478.4000 26603 2010-10-09 04:08:27 35773482.7500 26603 2010-10-09 05:09:23 39073367.5714 26603 2010-10-09 06:10:23 34379049.8667 26603 2010-10-09 07:11:24 33984045.7705 26603 2010-10-09 09:12:23 36276301.8305 26603 2010-10-09 10:13:23 37248366.0000

    If the logfile entries are always sorted by hour, you don't even need the hash:

    use strict; use warnings; my $count = 0; my $sum = 0; my $previous_hour = -1; while (<DATA>) { chomp; next if /^itemid/; my @F = split; my @J = split(/\:/,$F[2]); my $hour = $J[0]; if ($hour != $previous_hour && $count) { print $previous_hour, " ", $sum / $count, "\n"; $sum = 0; $count = 0; } $sum += $F[-1]; $count++; $previous_hour = $hour; }
    Perl 6 - links to (nearly) everything that is Perl 6.
Re: Calculate hourly averages
by locked_user sundialsvc4 (Abbot) on Oct 15, 2010 at 17:04 UTC

    It also occurs to me that you could stuff this into an SQL table and GROUP BY an expression that discards minutes.   Or, you could write a stub Perl script that pre-processes the data for such a purpose.   I make this suggestion on the basis that, “you might wish to get to SQL-land as soon as you can, for other reasons.”

      Thank you all for your valuable input it has really given me a lot to think about especially in terms of the different ways it can be done, will also give the SQL way a try thanks again!

Re: Calculate hourly averages
by Anonymous Monk on Oct 15, 2010 at 15:15 UTC
    my $re = qr/ ( \d+ ) # $1 id \s+ \d{4}-\d{2}-\d{2}\s ( \d{2} ) :\d{2}:\d{2} # $2 hour \s+ (\S+) # $3 val /x; while(<>){ if( /$re/ ){ my( $id, $hour, $value ) = ( $1 , $2, $3 ); ...