I suggest you to implement all three of these features. I reduced spam by doing the following actions :
- give user an unique ID, the comments also have an unique ID. This ID is time limited.
- If the unique ID and the CommentID is not from the same user; redlist user
- If any user submits more than 5 comments / hour something might be wrong; redlist user
- If any user submits more than 4 URL's in one message without atleast 10 words inbetween: redlist user;
- if any of the text submitted matches any of the keywords I have in an external file: blacklist forever;
- if that user sends the same text 2 times in a row while redlisted; blacklist ip for 10m
- if that user sends any of the keywords of that saved text next time; with same ip: blacklist forever
a few things to note:
- A redlist user is a user "on probation" which is tracked a little bit more than the normal users; because his score of "risk-of-spam" is higher than a normal post.
- The ID's are all MD5 hashes and recalculated on the server containing one or more server variables + a "keycode" which can be changed at any time for any user (you never know these spammers get superdupermegacomputing to do their business ;))
- I wanted to stay clear for the next upcoming MONTHS; not weeks, I have tried a unique md5 ID, it works but they started submitting one by one; tried with IP blacklists; works but they use proxies and ip addresses; added proxy/via parameters too; still they use another IP address; I got so sick of it I disabled a lot of boards till I found out this method is still working after 7 months and more ...
- I use a ramdisk to keep these sessions temporary with File::Cache; 64mb should be enough(tm);
- A session should not be taking longer than 10 to 15 minutes; whenever the session expires you can still keep the form contents and ask the user to either relogin or preview again because the session has been expired; I had this only 3 times in a year so I just represent the form and tell the user the session has been expired; maybe I will add CAPTCHA at that step; might be a good idea with the links I have seen from other monks uphere;
- I am thinking to reduce even more by sharing all the "spammers-delight-data" over all users of the server(s); that way all users are protected against one kind of spammer;
- All this "could" be used with a database which you can use to moderate messages for better control
I have reduced spam from 150 to 200 messages / day to -maximal- 3 messages a week per user that way; I still keep the spam as a measurement tool and to see if the system is really working without too many false positives and it's currently working like a charm on my customers comment boards and classified ads.
I am even thinking (loudly) about making such information freely available in XML files which can be used to block spammers at web-based applications, but that might be dreaming? I don't know; it would surely affect my own system because the spammers could also see which words they are being blocked for because all data is known...
Maybe Bayesian filtering might be an idea for such misuse on web based applications ? anyone ?
As last but not least; combatting spammers is not based on one method, as soon as these spammers find out which way you use to block them they will find ways around it to still post their pesky unwanted messages to spraypaint the walls of your website. It's a never ending war; how more methods you implement how more despair they will get to get through your protection.
Since some time now spammers are using "realistic looking messages" to beat the most intelligent spamming systems to penetrate to your desktop. A computer will never know the message is valid or not; it can only guess by your algorithms, but it's still better to be 1% frustrated and 99% spammers shielded off instead of 100% frustration .. not?