Recognizing repetitive spammers using a time base method

Hagbone has asked for the wisdom of the Perl Monks concerning the following question:

I'm looking for references, module, script, etc. that will help me with a "time" based method of catching email abusers .... the situation:

A public venue (forum) where visitors contact those who post messages. The contact process does not reveal the raw email address of the person being contacted, and it also records the contact event. The abusers of the system (spammers who send inappropriate/unsolicited messages) tend to do so in "bursts" .... many messages being sent to individuals who have posted, with a short time span in between messages. For automated spammers, the time span can be very short .... for the ones who manually paste in repeated content and manually submit (it happens), the time frame is longer.

Rather then reinvent the wheel, I'm wondering if something has been written that alerts, or flags, events that are occurring (from the same abuser) within a specified time range.

I've tried searching the archives, but using "spam" in the search phrase seems to guarantee that the results will be way too deep to wade through. At any rate, before digging in to program something that appears to be pretty straightforward, I thought it made sense to see if somebody's been there and done that in a way that would make anything I tried to put together look awfully weak ;)

Comment on Recognizing repetitive spammers using a time base method

Replies are listed 'Best First'.
Re: Recognizing repetitive spammers using a time base method by saintmike (Vicar) on Mar 10, 2005 at 00:37 UTC
What's working reasonably well is maintaining a cache with IP addresses in Cache::Cache, either in memory or on disc. If you find the IP in the cache, block the request. Experiment with the cache expiration setting. This will be tough on requests from proxy farms (like AOL), because every request will look like its coming from the same IP, but for low-traffic sites, it's good enough.	[reply]
Re^2: Recognizing repetitive spammers using a time base method by jhourcle (Prior) on Mar 10, 2005 at 03:57 UTC
You can better deal with proxies by caching not just `$ENV{'REMOTE_ADDR'}` but something like `join '\|', $ENV{'REMOTE_ADDR'}, $ENV{'X_FORWARDED_FOR'};` It's not perfect, but it's better than just REMOTE_ADDR on its own. Update:this was to deal with the issue in false positives from proxies -- as with any sort of tuning of this nature, reducing the false positives can increase the risk of false negatives. (ie, letting more good stuff through has a chance of also letting more bad stuff through). It's up to each person as to which one of the two is worse, and what the acceptable false neg / false pos limits are. You might also try asking at the spamtools or spam-l lists.	[reply] [d/l] [select]
Re^3: Recognizing repetitive spammers using a time base method by Anonymous Monk on Mar 10, 2005 at 11:57 UTC
Except it won't catch spammers that know this trick, and fill in their own X-Forwarded-For header. With bad guys, you cannot trust anything they send, and that includes almost all HTTP headers (the only time there are HTTP headers you can trust when receiving requests from baddies is if the baddies go via a proxy, the proxy inserts the headers, and you trust the proxy - for instance, you might want to decide to trust the aol proxies, and hence the headers they insert).	[reply]
Re: Recognizing repetitive spammers using a time base method by TedPride (Priest) on Mar 10, 2005 at 10:08 UTC
Keep a log of posts containing user name, IP address, timestamp, content hash. Check for posts from the same user name or IP within the last 30 seconds or so, or posts with the same content within a larger number of seconds. This won't stop someone from submitting dynamically generated posts every 31 seconds, but it should at least prevent the majority of spam.	[reply]
Re: Recognizing repetitive spammers using a time base method by nmerriweather (Friar) on Mar 11, 2005 at 21:19 UTC
http://modperl.com:9001/book/chapters/ch6.html Check out the section "Blocking Greedy Clients" -- and Apache::SpeedLimit liting 6.4 i have a damn good memory!	[reply]