Re: Creative sorting and totalling of large flatfiles (aka pivot tables)

In my experience, calculating summary statistics in perl for a few million records is not terribly time consuming. If the CSV file has no quoting issues, then a simple loop suffices:

open LOG, "<$log_file" or die "Could not open $log_file\n";
my %ip_count;
while (<LOG>) {
   my ($ip, $severity, $date, ...) = split /\s*,\s*/;
   $ip_count{$ip}++;
   # other summary stat calcs below
}
[download]

For more complex CSV files, try Text::xSV, which handles the full CSV grammar.

For an athlon xp2100 system and a gig of memory, most stats calculations with 1-10x10^6 records typically took 1-10 minutes. Even 30 of these will only take a few hours. As long as you process one line at a time and and have enough RAM to hold your hashes, calculations should go quickly.

-Mark

Comment on Re: Creative sorting and totalling of large flatfiles (aka pivot tables) Download Code