Re: grepping for
by lhoward (Vicar) on May 12, 2000 at 18:42 UTC
|
I think you are on the right path with your first example. You
probably just need to strip the protocol and host information out of the
URL first. You could use the URI module to parse the URL apart and reg-match on the path part.
use URI;
my $dir='acm';
...looping through the file
...$foo contains the URL from the logfile
my $u=new URI($foo);
if($u->path()=~/^\/$dir\//){
.. do stuff in here 'cause we got an "/acm/" line
}
That probably isn't the fastest or most efficient way to do it,
so you may want to tune the code if this is something that
you will do often. | [reply] [d/l] |
Re: grepping for
by chromatic (Archbishop) on May 12, 2000 at 19:07 UTC
|
my $dir = "/acm";
# open file
while (<LOG>) {
next unless /^\Q$dir\E/;
# do something with $_ because it matches
}
If you slurp all of the lines into an array, you could also use the grep command. | [reply] [d/l] |
Re: grepping for
by mikfire (Deacon) on May 12, 2000 at 18:48 UTC
|
Try using m#$dir# instead of /$dir/. Do not hold me to this,
but I believe the slashes in the regex are confusing perl.
Perl first does double-quotish expansion ( how does this phrase
seem to appear in all my posts? ) before passing anything to
the regex engine. In this case, the regex engine is seeing
something like /^\//acm\//. Which makes me think it is
returning any line beginning with a '/'.
You could also try using the quotemeta modifiers \Q and\E which cause perl
to protect all special characters with backslashes. To do this,
your regex would look like /\Q$dir\E/. I had mixed luck
with the quotemeta stuff a long time ago. It may work better
now or your mileage may vary.
Mik
Mik Firestone ( perlus bigotus maximus ) | [reply] |
RE: grepping for
by Maqs (Deacon) on May 12, 2000 at 19:11 UTC
|
if ($url =~ /\/$dir\//) { #... some stuff ...<br>
};
IMHO, if you have a long path, your "^" in regexp might not work properly
--
With best regards
Maqs. | [reply] [d/l] |
Re: grepping for
by jjhorner (Hermit) on May 12, 2000 at 19:50 UTC
|
Well, I got it working, although it is amazingly slow, using if (/^\/$dir\// && /$error/) {..do something..}.
#!/usr/bin/perl -w
#
# error-report.pl
#
# usage:
# error-report.pl <dir> <error>
# where:
# <dir> is the directory of the application
# <error> is the number of the error code
#
# requires a filtered copy of a log file to exist.
#
# v0.1, jh8@ornl.gov, 5/12/2000
#
use strict;
# my $file = shift;
my $dir = shift;
my $error = shift;
$error = " ".$error." ";
open (LOG, "/usr/local/apache/logs/access_log") || die "Can't open log
+file: $!";
my (@entries, @log, %report, @list);
while (<LOG>) {
my $url = (split())[6];
if ($url =~ /^\/$dir\// && /$error/) {
if (exists $report{$url}) {
$report{$url}++;
} else {
$report{$url} = 1;
}
}
}
close LOG;
@list = sort {$report{$b} <=> $report{$a}}
keys %report;
foreach (@list) {
print "$_: $report{$_}\n";
}
I would like to find a way to do it quicker, possibly using "grep" and slurping the file into memory, but since the file is a over 200MB, I'm not too optimistic.
Thanks for your help.
Linux, Perl, Apache, Stronghold, Unix
jhorner@knoxlug.org http://www.knoxlug.org | [reply] [d/l] |
|
|
Instead of your current while loop try this one which should be somewhat faster.
I replaced your first regular expressions with a "substr eq"
combination.
my $dir='/acm/';
my $strlen=length ($dir)
while (<LOG>) {
my $url = (split())[6];
if ((substr($url,0,$strlen) eq $dir) && ($url=~/$error/)) {
if (exists $report{$url}) {
$report{$url}++;
} else {
$report{$url} = 1;
}
}
}
I benchmarked this using some sample data I made up
and it is about %35 faster
than the original version. Your performance may vary.
Regular-expressions are slower than straight
string operations. So if you can accomplish what you want with string operations
and performance matters, then you can tune your code by replacing
some regular-expressions with string operations (only feasible for
simple regular-expressions). | [reply] [d/l] |
|
|
Your code looks good, but in order for me to get the right error codes, $url needs to be checked either against the entire line, or against $_[-1].
Thanks again,
JJ
| [reply] [d/l] |
|
|
$dir = "/$dir/";
my $strlen = length $dir;
while (<LOG>) {
my ($url, $code) = (split)[6,8];
if (substr($url, 0, $strlen) eq $dir && $code == $error) {
$report{$url}++;
}
}
| [reply] [d/l] [select] |
|
|
Put the /o modifier on the end of those regexes. Since $dir and $error don't appear to change at all during the loop, you can optimize the regexp by only building it once. Could save some time.
| [reply] |
RE: grepping for
by turnstep (Parson) on May 12, 2000 at 19:22 UTC
|
If it is a standard log file, it will not have host and protocol
information. Try something like this:
$dir = "/acm/"; ## avoid worrying about escaping slashes
if (m/($dir.*) /) {
print "The path is $1!\n";
}
Your first method should work, by the way. Perhaps give us
an example line from the log file and what result you get?
| [reply] [d/l] |