vsailas has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I require Monks Wisdom it trying to detect Web server abuse and prevent robot like page access.
Detection is to be done on a web server's Access_LOG file, with entries like
200.1.1.62 - - [31/Dec/2007:01:21:42 -0500] "POST /cgi-bin/idb/1.0/por +tal/adduser.cgi HTTP/1.1" 200 4266 200.1.1.62 - - [31/Dec/2007:01:21:43 -0500] "GET /idb/1.0/html/portal/ +cust.css HTTP/1.1" 304 -

Say like more than 200 access/hour from a particular IP, detect alphabetic site access etc. Please guide me on how to accomplish this task and if possible sample codes or articles concerning this.

Thank you in advance.

Replies are listed 'Best First'.
Re: Web server abuse detection
by marto (Cardinal) on Dec 31, 2007 at 08:03 UTC
    I am sure someone will point you towards a better solution, however you could start by looking at something like this or using mod_rewrite to block users based on IP.

    Hope this helps

    Martin
Re: Web server abuse detection
by bradcathey (Prior) on Dec 31, 2007 at 08:35 UTC

    Though it may be overkill for this one specific application, we have been using Squid on our servers for sometime with good results. Can be a bit tricky to setup, but it's a solid app with lots of options.


    —Brad
    "The important work of moving the world forward does not wait to be done by perfect men." George Eliot
Re: Web server abuse detection
by locked_user sundialsvc4 (Abbot) on Dec 31, 2007 at 17:09 UTC

    The Apache log-file format is pretty standard and there are CPAN tools which can easily parse and analyze it.

    A Perl script could be launched to run at regular intervals, e.g. by a crontab in Unix, which would scan the current log-file and build up answers concerning it. This script would run in the context of a local user, not the Apache-server userid, and thus would be able to see and interpret the contents of the log file. (It could, of course, place its findings into a file that a web-page could “see.”)

    Again, “you are not the first soul to have thought of this,” so search very-aggressively through CPAN for tools that can help you. In many ways the greatest strength of this language is not the language itself, but rather CPAN.