Here's the familiar drill: To parse a "common log format" file (assuming you're doing it yourself), the conventional wisdom says to write:
Yawn. Been there, done that, right?while (<>) { my ($host, $ident_user, $auth_user, $date, $time, $time_zone, $method, $url, $protocol, $status, $bytes) = /^(\S+) (\S+) (\S+) \[([^:]+):(\d+:\d+:\d+) ([^\]]+\] "(\S+) (.+?) + (\S+)" (\S+) (\S+)$/; ...
Maybe not. Let's take a second look at $auth_user. Unless you're using basic authentication to password protect pages, you'll see this in your logs as '-'. No problem there. And if you are using basic authentication, you'll see a username. No problem there... unless the username cannot contain whitespace, at which point the regexp fails to match. And since there's no check to see if it fails...
But can a username contain whitespace? Let's see.
D'oh! RFC1945 says you aren't supposed be able to do this! (Update: RFC1945 is obsolete. RPF2617 suggests that embedded spaces are OK. Hm...)% htpasswd .htpasswd 'd w s' New password: Re-type new password: Adding password for user d w s %
The simple solution would seem to be "So, don't do that!", but here's where things get stranger. I've recently seen a case where somebody apparently presented an Authentication: header to a non-protected resource on my site, resulting in a bogus name appearing in the logs. I say "apparently" because I've been able to duplicate the behavior, and I can't think of any other way for the bogus username to have appeared. A minor annoyance, or the basis for a crude denial-of-accurate-service attack against log analysis software.
Fortunately, the solution is straightforward. All you have to do is change /^(\S+) (\S+) (\S+) \[ ... to /^(\S+) (\S+) (.+) \[ ... This makes the regex much less efficient, since it's going to backtrack to match '[', but it will resolve the problem, even if someone forges the username "dws [".
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: A rare, insidious logfile parsing pitfall
by echo (Pilgrim) on Oct 27, 2001 at 14:57 UTC | |
|
Re: A rare, insidious logfile parsing pitfall
by Fletch (Bishop) on Oct 28, 2001 at 07:32 UTC | |
by blakem (Monsignor) on Oct 28, 2001 at 14:26 UTC |