in reply to grepping for

Well, I got it working, although it is amazingly slow, using if (/^\/$dir\// && /$error/) {..do something..}.
#!/usr/bin/perl -w # # error-report.pl # # usage: # error-report.pl <dir> <error> # where: # <dir> is the directory of the application # <error> is the number of the error code # # requires a filtered copy of a log file to exist. # # v0.1, jh8@ornl.gov, 5/12/2000 # use strict; # my $file = shift; my $dir = shift; my $error = shift; $error = " ".$error." "; open (LOG, "/usr/local/apache/logs/access_log") || die "Can't open log +file: $!"; my (@entries, @log, %report, @list); while (<LOG>) { my $url = (split())[6]; if ($url =~ /^\/$dir\// && /$error/) { if (exists $report{$url}) { $report{$url}++; } else { $report{$url} = 1; } } } close LOG; @list = sort {$report{$b} <=> $report{$a}} keys %report; foreach (@list) { print "$_: $report{$_}\n"; }
I would like to find a way to do it quicker, possibly using "grep" and slurping the file into memory, but since the file is a over 200MB, I'm not too optimistic. Thanks for your help. Linux, Perl, Apache, Stronghold, Unix jhorner@knoxlug.org http://www.knoxlug.org

Replies are listed 'Best First'.
RE: Re: grepping for
by lhoward (Vicar) on May 12, 2000 at 20:03 UTC
    Instead of your current while loop try this one which should be somewhat faster. I replaced your first regular expressions with a "substr eq" combination.
    my $dir='/acm/'; my $strlen=length ($dir) while (<LOG>) { my $url = (split())[6]; if ((substr($url,0,$strlen) eq $dir) && ($url=~/$error/)) { if (exists $report{$url}) { $report{$url}++; } else { $report{$url} = 1; } } }
    I benchmarked this using some sample data I made up and it is about %35 faster than the original version. Your performance may vary.

    Regular-expressions are slower than straight string operations. So if you can accomplish what you want with string operations and performance matters, then you can tune your code by replacing some regular-expressions with string operations (only feasible for simple regular-expressions).

      Your code looks good, but in order for me to get the right error codes, $url needs to be checked either against the entire line, or against $_[-1]. Thanks again, JJ
RE: Re: grepping for
by takshaka (Friar) on May 12, 2000 at 21:19 UTC
    index(), in turn, may be a little faster than substr(). And replacing the regex on the error code should help, too. [code removed to protect the innocent] Update:

    index() is faster when the match occurs at the beginning of the string, but substr() is much better when there is no match at all, which happens quite a lot in this application.

    $dir = "/$dir/"; my $strlen = length $dir; while (<LOG>) { my ($url, $code) = (split)[6,8]; if (substr($url, 0, $strlen) eq $dir && $code == $error) { $report{$url}++; } }
RE: Re: grepping for
by chromatic (Archbishop) on May 13, 2000 at 02:41 UTC
    Put the /o modifier on the end of those regexes. Since $dir and $error don't appear to change at all during the loop, you can optimize the regexp by only building it once. Could save some time.