in reply to Re: splitting a string that appears inconsistently in structure
in thread splitting a string that appears inconsistently in structure

Unfortunately I don't find those all to be true at all. For reference, here are some examples of entries that I have:

62.88.40.141 - - [09/Jan/2008:03:45:10 -0800] "GET /core_level.cgi?cor +e=1 HTTP/1.1" 302 83 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows +NT 5.1; .NET CLR 1.1.4322)" "62.88.40.141" (call this a normal-ish request) 10.16.0.2 - - [09/Jan/2008:02:20:39 -0800] "GET /home/eval_load.cgi?50 +" 200 2 "-" "-" "10.16.0.2" (no version) 10.16.1.3 - - [10/Jan/2008:02:18:58 -0800] "GET /" 200 752 "-" "-" "10 +.16.1.3" (no version, no ?, and nothing after the ?) 10.16.0.2 - - [19/Jan/2008:03:45:06 -0800] "GGG99994" 200 752 "-" "-" +"10.16.0.2" (here we have no method, no discernible request, and no v +ersion)
Hence the need to figure out how to detect what is there.

Replies are listed 'Best First'.
Re^3: splitting a string that appears inconsistently in structure
by fullermd (Vicar) on Jan 02, 2009 at 12:12 UTC
    10.16.0.2 - - [19/Jan/2008:03:45:06 -0800] "GGG99994" 200 752 "-" "-" +"10.16.0.2"
    (here we have no method, no discernible request, and no version)

    Not exactly true. You have a method. It's just a really weird (and probably invalid) one. I'm not sure why your server would 200 it; I can only presume some slightly odd config.

    Take it in individual steps. First try splitting out into the 3 main pieces:

    my ($method, $uri, $proto, @extra) = split /\s+/, $request; die "Unexpected extra bits in request: @extra" if @extra > 0; die "No method" unless defined $method; # Or whatever other error-handling mechanism you want

    You shouldn't have any extra bits, becaue if you do, that means that your $method, $uri, $proto may not hold what you expect them to, so that needs error-checking.

    As well, you should have a method. The minimal possible HTTP request AFAIK would be a method of " ", with nothing else. That would leave all the vars undefined, and probably isn't something you care about anyway, so another error there.

    The protocol may not be there. But expect that in higher level code, or defined-or it to an empty string here if you prefer.

    That leaves the URI. Using URI::Split as suggested above in Re: splitting a string that appears inconsistently in structure would be better than trying to split it up manually. Imagine, for instance, the case of having a '?' in the password; a simple regexp would give you a wrong answer then.

    Note that the $uri can be undefined. A request of just "GET " is interpreted as "GET /" (similarly with POST), and would leave $uri undefined after that split. You probably want to make sure it's defined (as an empty string in this case) before you pass it to uri_split(). The URI::Split docs say:

    The $path part is always present (but can be the empty string) and is thus never returned as "undef".

    So take care not to blow up if it's empty.

      That leaves the URI.

      For the sake of precision, by the by, it's not really a URI we've got here, it's just the path/query bit of it. But uri_split() does the right thing.

Re^3: splitting a string that appears inconsistently in structure
by zwon (Abbot) on Jan 02, 2009 at 11:53 UTC
    10.16.0.2 - - [19/Jan/2008:03:45:06 -0800] "GGG99994" 200 752 "-" "-" "10.16.0.2"

    Is it real line from log? Status code is 200 Ok, so it looks like your apache successfully handled this request, though it shouldn't.

    I think it may be a good idea to handle malformed requests separately.

Re^3: splitting a string that appears inconsistently in structure
by Anonymous Monk on Jan 02, 2009 at 08:09 UTC
    Whats the format string for that log?