raiten has asked for the wisdom of the Perl Monks concerning the following question:
I'm trying to dispatch some contents from an apache log depending on a regexp and i have a problem. It seems strings which don't match my regexp goes in the matched area ...
example command-line:$ cat access_log | perl -pe 'if (s/^(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3 +}).*"(GET|POST|HEAD) (.*?) HTTP\/.*$/$1,$3/) { print } else { print S +TDERR }' 2>out.err > out.csv
[...] 72.30.161.243,/ 72.30.161.243,/ 125.224.206.168 - - [04/Oct/2009:00:13:42 +0200] "-" 408 - "-" "-" 125.224.206.168 - - [04/Oct/2009:00:13:42 +0200] "-" 408 - 125.224.206.168 - - [04/Oct/2009:00:13:47 +0200] "CONNECT 203.188.201. +253:25 HTTP/1.1" 404 516 "-" "-" 125.224.206.168 - - [04/Oct/2009:00:13:47 +0200] "CONNECT 203.188.201. +253:25 HTTP/1.1" 404 516 96.243.255.188,//phpMyAdmin/ [...]
STDERR has only valid contents (not matching regexp)
corresponding part of the source file:72.30.161.243 - - [03/Oct/2009:17:21:43 +0200] "GET / HTTP/1.0" 404 51 +6 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; http://help.yahoo.c +o m/help/us/ysearch/slurp)" 72.30.161.243 - - [03/Oct/2009:17:21:43 +0200] "GET / HTTP/1.0" 404 51 +6 125.224.206.168 - - [04/Oct/2009:00:13:42 +0200] "-" 408 - "-" "-" 125.224.206.168 - - [04/Oct/2009:00:13:42 +0200] "-" 408 - 125.224.206.168 - - [04/Oct/2009:00:13:47 +0200] "CONNECT 203.188.201. +253:25 HTTP/1.1" 404 516 "-" "-" 125.224.206.168 - - [04/Oct/2009:00:13:47 +0200] "CONNECT 203.188.201. +253:25 HTTP/1.1" 404 516 96.243.255.188 - - [04/Oct/2009:00:26:17 +0200] "GET //phpMyAdmin/ HTT +P/1.1" 404 516 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)" 96.243.255.188 - - [04/Oct/2009:00:26:17 +0200] "GET //phpMyAdmin/ HTT +P/1.1" 404 516
Has someone encounters a similar bug ? or is it my regexp ? seems hard to believe that it matched the CONNECT line ...
Normally, out.csv must contains only csv lines.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: regexp matching bad stuff ...
by ikegami (Patriarch) on Oct 09, 2009 at 14:36 UTC | |
|
Re: regexp matching bad stuff ...
by ikegami (Patriarch) on Oct 09, 2009 at 15:04 UTC | |
|
Re: regexp matching bad stuff ...
by Fletch (Bishop) on Oct 09, 2009 at 15:08 UTC | |
|
Re: regexp matching bad stuff ...
by kennethk (Abbot) on Oct 09, 2009 at 14:47 UTC |