Re: Design Approach (Processing Huge Hash)

Storing all the lines in all the logfiles seems the wrong way to go here. Both of your objectives seem more parse-as-you-go.

For printing the top x lines, simply read the top x lines from each file, then sort them and print the top x of the sort. You'll never need most of the lines in the file.

For getting summary data, just keep running totals on the data you care about. For min and max values that's easy, but you can even keep running averages and such fairly easily.

my ($avg, $num_contrib) = (0, 0);
foreach (<get a line>) {
    my $val = <something>;
    $avg += ($val - $avg) / ++$num_contrib;
}
[download]

Other kinds of running totals can be gotten by similar algorithms.

Naturally, you'd want to combine all these things so you're only doing a single pass through each file, for efficiency.

Ron Steinke
<rsteinke@w-link.net>

Comment on Re: Design Approach (Processing Huge Hash) Download Code