Re: Re: Obtaining Apache logfile stats?

Replies are listed 'Best First'.
Re: Re: Re: Obtaining Apache logfile stats? by sauoq (Abbot) on Mar 25, 2004 at 22:28 UTC
The quick and dirty approach would be to just carve it up on white space like you are doing with awk anyway. The conversion is straight forward. Use `split` or perl's `-a` option (as in my example above.) Regardless of how you parse the input, you'll probably find it worthwhile to compute the statistics for every file accessed on one pass through your log. That's a lot more efficient than reading your whole log once for each file you want stats on. That's easy enough; just use a hash to maintain data for each filename as you traverse the log. -sauoq "My two cents aren't worth a dime.";	[reply] [d/l] [select]
Re: Re: Re: Re: Obtaining Apache logfile stats? by mvam (Acolyte) on Mar 25, 2004 at 23:14 UTC
split is a good idea.. would it be faster to use an array?	[reply]
Re: Re: Re: Re: Re: Obtaining Apache logfile stats? by sauoq (Abbot) on Mar 26, 2004 at 00:41 UTC
would it be faster to use an array? If you mean "faster to use an array instead of a hash for collecting the data", then no, it would not be faster. I would split each line into an array during processing though. The idea is to key the hash by the filenames. So, everytime you come across, for instance, "/some/dir/file.html", you increase a count and a sum. The code might look something like this (untested): `while (<LOG>) { my @part = (split ' ', $_)[5,8]; $hash{$part[0]}->[0] ++; # increase the count. $hash{$part[0]}->[1] += $part[1]; # increase the sum. }` [download] Note that the values of the hash are arrayrefs in order to store both the count and the sum associated with each filename. After you've munged your logs into raw data, you'll traverse the hash you created and compute the stats you want. Something like (again, untested): `for my $key (sort keys %hash) { my $avg = $hash{$key}->[1] / $hash{$key}->[0]; # sum/count. print "$key\t$avg\n"; }` [download] -sauoq "My two cents aren't worth a dime.";	[reply] [d/l] [select]
Re: Re: Re: Re: Re: Re: Obtaining Apache logfile stats? by mvam (Acolyte) on Mar 26, 2004 at 16:54 UTC
Re: Re: Re: Re: Re: Re: Re: Obtaining Apache logfile stats? by mvam (Acolyte) on Apr 01, 2004 at 23:33 UTC
Re: Re: Re: Obtaining Apache logfile stats? by sauoq (Abbot) on Mar 25, 2004 at 21:56 UTC
How are you calculating the "0_seconds" portion of your sample data? -sauoq "My two cents aren't worth a dime.";	[reply]
Re: Re: Re: Re: Obtaining Apache logfile stats? by DamnDirtyApe (Curate) on Mar 25, 2004 at 22:05 UTC
sauoq, I believe that's coming from the "%T" portion of the log format. _______________ DamnDirtyApe Those who know that they are profound strive for clarity. Those who would like to seem profound to the crowd strive for obscurity. --Friedrich Nietzsche	[reply]
Re: Re: Re: Re: Obtaining Apache logfile stats? by mvam (Acolyte) on Mar 25, 2004 at 22:01 UTC
0_seconds is a sed substitution .. its really just 0 or 1 or whatever was returned.so it would be 0_seconds, 1_second, etc.. i was hoping it would be easier to read in the output file, but it doesnt really matter if its there or not since the average is used.	[reply]