Re: grepping for

Well, I got it working, although it is amazingly slow, using if (/^\/$dir\// && /$error/) {..do something..}.

#!/usr/bin/perl -w
#
# error-report.pl
#
# usage:
#  error-report.pl <dir> <error>
# where:
#  <dir> is the directory of the application
#  <error> is the number of the error code
#
# requires a filtered copy of a log file to exist.
#
# v0.1, jh8@ornl.gov, 5/12/2000
#

use strict;
# my $file = shift;
my $dir = shift;
my $error = shift;
$error = " ".$error." ";
open (LOG, "/usr/local/apache/logs/access_log") || die "Can't open log
+file: $!";
my (@entries, @log, %report, @list);

while (<LOG>) {
        my $url = (split())[6];
        if ($url =~ /^\/$dir\// && /$error/) {
                if (exists $report{$url}) {
                        $report{$url}++;
                } else {
                        $report{$url} = 1;
                }
        }
}
close LOG;
@list = sort {$report{$b} <=> $report{$a}}
        keys %report;
foreach (@list) {
        print "$_:  $report{$_}\n";
}
[download]

I would like to find a way to do it quicker, possibly using "grep" and slurping the file into memory, but since the file is a over 200MB, I'm not too optimistic. Thanks for your help. Linux, Perl, Apache, Stronghold, Unix jhorner@knoxlug.org http://www.knoxlug.org

Comment on Re: grepping for Download Code

Replies are listed 'Best First'.
RE: Re: grepping for by lhoward (Vicar) on May 12, 2000 at 20:03 UTC
Instead of your current while loop try this one which should be somewhat faster. I replaced your first regular expressions with a "substr eq" combination. `my $dir='/acm/'; my $strlen=length ($dir) while (<LOG>) { my $url = (split())[6]; if ((substr($url,0,$strlen) eq $dir) && ($url=~/$error/)) { if (exists $report{$url}) { $report{$url}++; } else { $report{$url} = 1; } } }` [download] I benchmarked this using some sample data I made up and it is about %35 faster than the original version. Your performance may vary. Regular-expressions are slower than straight string operations. So if you can accomplish what you want with string operations and performance matters, then you can tune your code by replacing some regular-expressions with string operations (only feasible for simple regular-expressions).	[reply] [d/l]
RE: RE: Re: grepping for by jjhorner (Hermit) on May 12, 2000 at 21:38 UTC
Your code looks good, but in order for me to get the right error codes, $url needs to be checked either against the entire line, or against `$_[-1]`. Thanks again, JJ	[reply] [d/l]
RE: Re: grepping for by takshaka (Friar) on May 12, 2000 at 21:19 UTC
index(), in turn, may be a little faster than substr(). And replacing the regex on the error code should help, too. `[code removed to protect the innocent]` Update: index() is faster when the match occurs at the beginning of the string, but substr() is much better when there is no match at all, which happens quite a lot in this application. `$dir = "/$dir/"; my $strlen = length $dir; while (<LOG>) { my ($url, $code) = (split)[6,8]; if (substr($url, 0, $strlen) eq $dir && $code == $error) { $report{$url}++; } }` [download]	[reply] [d/l] [select]
RE: Re: grepping for by chromatic (Archbishop) on May 13, 2000 at 02:41 UTC
Put the /o modifier on the end of those regexes. Since $dir and $error don't appear to change at all during the loop, you can optimize the regexp by only building it once. Could save some time.	[reply]