Re: Efficient Way to Parse a Large Log File with a Large Regex

Saving your list of IP's to a database and checking each log-entry against this DB as soon as the log-entry gets written.

If you can 'capture' the writing to the log-file and pipe it to a perl-program to extract the IP's and check it against the database that seems feasible.

CountZero

"If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Comment on Re: Efficient Way to Parse a Large Log File with a Large Regex

Replies are listed 'Best First'.
Re^2: Efficient Way to Parse a Large Log File with a Large Regex by Steve_p (Priest) on Apr 12, 2005 at 21:39 UTC
This seems pretty reasonable. Additionally, you could create a simple POE process to tail the log file rather than piping through tail. There are several examples at the POE website. Also, merlyn has an article on tailing a logfile and processing the entries on his website.	[reply]
Re^2: Efficient Way to Parse a Large Log File with a Large Regex by tlm (Prior) on Apr 13, 2005 at 01:06 UTC
It's fun to read all the replies. A lot of good ideas. I don't have anything new to add, other than this pointer to a Perl snippet by Lincoln Stein for using a DBMS for httpd logging. This approach reduces the problem of parsing log files to the much cleaner one of constructing SQL queries. And, as CountZero already pointed out, you can build in some hooks for preprocessing of log records, including one that does the checking against your table of IP addresses. Then all you have to do is check the the entries recorded with a timestamp more recent than the last check. (Incidentally, I vote for holli's hash lookup approach.) the lowliest monk	[reply]