in reply to Re: Data structure for statistics
in thread Data structure for statistics

Heh - yes, an example will surely help. In the long run, I want to display fancy graphs of things, possibly through SVG, PNG or OpenGL, but for the start, I want simple textual output, say, like the following:

avg/s cur/s avg/min cur/min avg/h cur/h + avg/day cur/day google: 6 10 360 500 21600 43200 + ... ... yahoo: 2 1 120 ... .. .. + ... ... ...

Here, avg/s stands for the average number of hits per second, across the whole reporting period, while cur/s stands for the current number of hits per second (or rather, the number of hits in the last second). avg/min stands for the average number of hits per minute, and cur/min for the current number of hits per minute, that is, the number of hits in the last 60 seconds. For /h, it's hour and for /day it's the day. The applications where periods longer than 7 days are interesting are periods where a 10 second or 1 minute resolution will be sufficient.

Currently, I'm still looking for a "better" approach than storing all the hits for all the seconds in my time windows, but so far haven't found one.

Update: Clarified averages and current.

Replies are listed 'Best First'.
Re^3: Data structure for statistics
by GrandFather (Saint) on Oct 05, 2008 at 22:35 UTC

    Something like:

    use strict; use warnings; my %urls = ( google => {min => 0, max => 20}, yahoo => {min => -3, max => 10}, msn => {min => -5, max => 5}, ); my @units = ( {unit => 's', tipAt => 60, updateAt => 20}, {unit => 'min', tipAt => 60, updateAt => 1}, {unit => 'h', tipAt => 24, updateAt => 1}, {unit => 'day', tipAt => 0, updateAt => 0}, ); my %stats; buildStructure (\%stats, \%urls, @units); srand (1); for my $second (1 .. 24 * 60 * 60) { my %hits = genHits (%urls); for my $url (keys %hits) { my $hit = $hits{$url}; my $bump = 1; my $updateHit = 0; my $updateCount = 0; my $urlStats = $stats{$url} ||= {}; $urlStats->{total} += $hit; for my $unit (@units) { my $unitStats = $urlStats->{$unit->{unit}}; if ($updateCount) { $unitStats->{extraHits} = $updateHit; $unitStats->{extraCount} = $updateCount; } last unless $bump; $bump = 0; $unitStats->{lhits} ||= 0; $hit = $unitStats->{hits} += $hit; ++$unitStats->{count}; if ($unit->{updateAt} and ! ($unitStats->{count} % $unit-> +{updateAt})) { $updateHit = $unitStats->{hits}; $updateCount = $unitStats->{count} / $unit->{tipAt}; } next if ! $unit->{tipAt} or $unitStats->{count} < $unit->{ +tipAt}; # Tipping time $unitStats->{lhits} = $unitStats->{hits}; $unitStats->{hits} = 0; $unitStats->{count} = 0; $unitStats->{extraHits} = 0; $unitStats->{extraCount} = 0; $bump = 1; } } next if int rand (1000); # Gen header printf "%-10s" . ((' %7s %7s') x @units) . "\n", 'URL', map {("avg/$_->{unit}", "cur/$_->{unit}")} @units; # Show stats for my $url (keys %hits) { my $urlStats = $stats{$url}; my $total = $urlStats->{total}; my $scale = 1; printf "%-10s", $url; for my $unit (@units) { my $unitStats = $urlStats->{$unit->{unit}}; my $avg = $total / $second * $scale; my $cur = $unitStats->{hits} + $unitStats->{lhits} + $unit +Stats->{extraHits}; my $curScale = $unitStats->{count} + $unitStats->{extraCou +nt}; $scale *= $unit->{tipAt} if $unit->{tipAt}; $curScale += $unit->{tipAt} if $second > $scale; $cur /= $curScale if $curScale; printf ' %7d %7d', $avg, $cur; } print "\n"; } } exit; sub buildStructure { my ($stats, $urls, @units) = @_; for my $url (keys %urls) { my $urlStats = $stats->{$url} = {}; for my $unit (@units) { my $unitStats = $urlStats->{$unit->{unit}} = {}; $unitStats->{$_} ||= 0 for qw(count hits lhits extraCount extraHits); } } } sub genHits { my (%urls) = @_; my %hits; for my $url (keys %urls) { my $hits = rand ($urls{$url}{max} - $urls{$url}{min}) + $urls{$url}{m +in}; $hits = 0 if $hits < 0; $hits{$url} = int $hits; } return %hits; }

    Prints:

    ... URL avg/s cur/s avg/min cur/min avg/h cur/h avg/day cur +/day google 9 9 573 571 34407 34409 825779 82 +6904 yahoo 3 3 208 210 12503 12502 300079 29 +9128 msn 1 0 60 59 3617 3617 86809 8 +6696 URL avg/s cur/s avg/min cur/min avg/h cur/h avg/day cur +/day google 9 8 572 569 34365 34384 824771 82 +6904 yahoo 3 3 208 211 12524 12517 300588 29 +9128 msn 1 0 60 58 3601 3603 86427 8 +6696 ...

    Perl reduces RSI - it saves typing
Re^3: Data structure for statistics
by GrandFather (Saint) on Oct 05, 2008 at 20:09 UTC

    I presume avg is over some modest window time and that cur is an average over a narrow window time. How do those times relate to units (s, min, h ...)?


    Perl reduces RSI - it saves typing