in reply to Creative sorting and totalling of large flatfiles (aka pivot tables)

In my experience, calculating summary statistics in perl for a few million records is not terribly time consuming. If the CSV file has no quoting issues, then a simple loop suffices:
open LOG, "<$log_file" or die "Could not open $log_file\n"; my %ip_count; while (<LOG>) { my ($ip, $severity, $date, ...) = split /\s*,\s*/; $ip_count{$ip}++; # other summary stat calcs below }
For more complex CSV files, try Text::xSV, which handles the full CSV grammar.

For an athlon xp2100 system and a gig of memory, most stats calculations with 1-10x10^6 records typically took 1-10 minutes. Even 30 of these will only take a few hours. As long as you process one line at a time and and have enough RAM to hold your hashes, calculations should go quickly.

-Mark

  • Comment on Re: Creative sorting and totalling of large flatfiles (aka pivot tables)
  • Download Code