I typicially do something like this:
zgrep -h '2001:13:5' *access_log* | logparse -p="%o" | sort | uniq -c | sort -r | head -30
Get the top 30 hosts that referred traffic during the 10-minute block between 1:50 and 2:00 PM. If I know there was an issue, like a spike or a dip, I use this filter to investigate what may have caused it. Works well with tail -f too.
I thought (briefly) about auto-building the regex by analyzing a conf file and/or the actual log, but it was quicker to write and hard-code them myself, and I don't change log file formats very often. You can do some pretty funky things with log file formats, and it doesn't seem like it would be easy to anticipate all those possibilities.
Would it still be useful as a filter if you use Compress::Zlib? I don't want to pass more data into the script than necessary. Still thinking about the idea.
| [reply] |
Would it still be useful as a filter if you use Compress::Zlib?
It would only be usefull in the cases where you zcat to your filter first, then grep on the resulting format. For example, grepping on a url could bring up referes and requests alike, so if you filter referers out first, you can then grep OK for requests. same for some other fields like status codes vs size vs IP.
| [reply] |