Re: Data structure for statistics

I'm pretty sure you're not going to go with my solution, but having just produced some live stats, I approached it a bit differently. I put the raw data into a database, and then, in another process, scanned the database for NULL fields, and calculated them. Then I just used some SQL to query counts, grouped by interesting fields as appropriate. Right before the query, I would DELETE FROM LOGS WHERE STAMP < CURRENT TIMESTAMP - 7 DAYS, to keep the logs circular.

This meant that I figured out what was important on each line only once, and then pulled all the numbers from the db as many times as needed (~168 times - once an hour for seven days before that line got deleted).

Performance-wise, mojotoad's approach (which seemed to parse the entire set of data each time) took about 3 seconds to produce, whereas mine is steadily hovering between 0.3 and 0.7 seconds - most of the time, around 0.4 seconds. Basically, an order of magnitude greater speed (most of the hard work is done outside the generation). Mind you, I may also be using an order of magnitude more memory, too, hard to tell ;-) though I do have more stats than he did.

Comment on Re: Data structure for statistics

Replies are listed 'Best First'.
Re^2: Data structure for statistics by Corion (Patriarch) on Oct 05, 2008 at 20:10 UTC
I've considered the approach of using an SQLite database as well. It makes formulating the statistics I want to collect much easier and of course extending is is just a matter of coming up with the right SQL instead of having to muck around with the counters manually. But the thing that kept me away from this so far is that I shy away from storing all the data in memory and that more or less a full table scan will need to be done every second. Of course, I could cheat here and only update the seconds-resolution statistics every second and update (say) the minute-resolution statistics every five seconds... It seems that I'll have to benchmark SQLite and its memory/disk requirements and compare them to the bytestring version.	[reply]

Replies are listed 'Best First'.

Re^2: Data structure for statistics
by Corion (Patriarch) on Oct 05, 2008 at 20:10 UTC

I've considered the approach of using an SQLite database as well. It makes formulating the statistics I want to collect much easier and of course extending is is just a matter of coming up with the right SQL instead of having to muck around with the counters manually. But the thing that kept me away from this so far is that I shy away from storing all the data in memory and that more or less a full table scan will need to be done every second. Of course, I could cheat here and only update the seconds-resolution statistics every second and update (say) the minute-resolution statistics every five seconds...

It seems that I'll have to benchmark SQLite and its memory/disk requirements and compare them to the bytestring version.

[reply]