in reply to Recognizing repetitive spammers using a time base method

What's working reasonably well is maintaining a cache with IP addresses in Cache::Cache, either in memory or on disc. If you find the IP in the cache, block the request. Experiment with the cache expiration setting.

This will be tough on requests from proxy farms (like AOL), because every request will look like its coming from the same IP, but for low-traffic sites, it's good enough.

  • Comment on Re: Recognizing repetitive spammers using a time base method

Replies are listed 'Best First'.
Re^2: Recognizing repetitive spammers using a time base method
by jhourcle (Prior) on Mar 10, 2005 at 03:57 UTC

    You can better deal with proxies by caching not just $ENV{'REMOTE_ADDR'} but something like

    join '|', $ENV{'REMOTE_ADDR'}, $ENV{'X_FORWARDED_FOR'};

    It's not perfect, but it's better than just REMOTE_ADDR on its own.

    Update:this was to deal with the issue in false positives from proxies -- as with any sort of tuning of this nature, reducing the false positives can increase the risk of false negatives. (ie, letting more good stuff through has a chance of also letting more bad stuff through). It's up to each person as to which one of the two is worse, and what the acceptable false neg / false pos limits are. You might also try asking at the spamtools or spam-l lists.

      Except it won't catch spammers that know this trick, and fill in their own X-Forwarded-For header. With bad guys, you cannot trust *anything* they send, and that includes almost all HTTP headers (the only time there are HTTP headers you can trust when receiving requests from baddies is if the baddies go via a proxy, the proxy inserts the headers, and you trust the proxy - for instance, you might want to decide to trust the aol proxies, and hence the headers they insert).