Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

Just wondering what is the best spam prevention method?

For example, I have a user signup form and want to prevent any spammed member accounts being created. Now I know my site is not sooo popular that the threat of being spammed out is high, but I decided to take on a new project and would like to learn this out of interest.

Some ways I thought of:
1.) CAPTCHA?
2.) CODE VERIFICATION?
3.) IP DETECTION/BLOCKING?? --how would this concept go about?

Any suggestions and insights would be great.

Thanks

Replies are listed 'Best First'.
Re: Best SPAM prevention method: Web App's
by freakingwildchild (Scribe) on May 01, 2007 at 22:07 UTC
    I suggest you to implement all three of these features. I reduced spam by doing the following actions :

    • give user an unique ID, the comments also have an unique ID. This ID is time limited.
    • If the unique ID and the CommentID is not from the same user; redlist user
    • If any user submits more than 5 comments / hour something might be wrong; redlist user
    • If any user submits more than 4 URL's in one message without atleast 10 words inbetween: redlist user;
    • if any of the text submitted matches any of the keywords I have in an external file: blacklist forever;
    • if that user sends the same text 2 times in a row while redlisted; blacklist ip for 10m
    • if that user sends any of the keywords of that saved text next time; with same ip: blacklist forever

    a few things to note:

    1. A redlist user is a user "on probation" which is tracked a little bit more than the normal users; because his score of "risk-of-spam" is higher than a normal post.
    2. The ID's are all MD5 hashes and recalculated on the server containing one or more server variables + a "keycode" which can be changed at any time for any user (you never know these spammers get superdupermegacomputing to do their business ;))
    3. I wanted to stay clear for the next upcoming MONTHS; not weeks, I have tried a unique md5 ID, it works but they started submitting one by one; tried with IP blacklists; works but they use proxies and ip addresses; added proxy/via parameters too; still they use another IP address; I got so sick of it I disabled a lot of boards till I found out this method is still working after 7 months and more ...
    4. I use a ramdisk to keep these sessions temporary with File::Cache; 64mb should be enough(tm);
    5. A session should not be taking longer than 10 to 15 minutes; whenever the session expires you can still keep the form contents and ask the user to either relogin or preview again because the session has been expired; I had this only 3 times in a year so I just represent the form and tell the user the session has been expired; maybe I will add CAPTCHA at that step; might be a good idea with the links I have seen from other monks uphere;
    6. I am thinking to reduce even more by sharing all the "spammers-delight-data" over all users of the server(s); that way all users are protected against one kind of spammer;
    7. All this "could" be used with a database which you can use to moderate messages for better control
    I have reduced spam from 150 to 200 messages / day to -maximal- 3 messages a week per user that way; I still keep the spam as a measurement tool and to see if the system is really working without too many false positives and it's currently working like a charm on my customers comment boards and classified ads.

    I am even thinking (loudly) about making such information freely available in XML files which can be used to block spammers at web-based applications, but that might be dreaming? I don't know; it would surely affect my own system because the spammers could also see which words they are being blocked for because all data is known...

    Maybe Bayesian filtering might be an idea for such misuse on web based applications ? anyone ?

    As last but not least; combatting spammers is not based on one method, as soon as these spammers find out which way you use to block them they will find ways around it to still post their pesky unwanted messages to spraypaint the walls of your website. It's a never ending war; how more methods you implement how more despair they will get to get through your protection.

    Since some time now spammers are using "realistic looking messages" to beat the most intelligent spamming systems to penetrate to your desktop. A computer will never know the message is valid or not; it can only guess by your algorithms, but it's still better to be 1% frustrated and 99% spammers shielded off instead of 100% frustration .. not?

Re: Best SPAM prevention method: Web App's
by kyle (Abbot) on May 01, 2007 at 18:41 UTC
Re: Best SPAM prevention method: Web App's
by gloryhack (Deacon) on May 01, 2007 at 21:55 UTC
    I'm not sure what you mean by "code verification", but I've been very successful with both CAPTCHA and blacklisting. I blacklist both by IP address and user agent, since some user agents are clearly not human users or responsible spiders.

    Blocking by IP is pretty easy stuff. Grab $ENV{REMOTE_ADDR} and compare that value to your list of blacklisted hosts. You could get really crafty and use, say, Net::IP::Match or Net::IP::Match::XS to get out of writing a lot of code. With one of those two modules, you could simply do:

    if(match_ip($ENV{'REMOTE_ADDR'}, @blacklist)) { # tell user to take a hike }

    You'd have to roll your own to do a match on user agent strings, but that's dirt simple stuff.

    CAPTCHA need not use images. If you have a long list of questions that can easily be answered by a cabbage, you can select from that list at random and ask the user to type his response.

    Have fun with it!

Re: Best SPAM prevention method: Web App's
by swares (Monk) on May 01, 2007 at 23:16 UTC
    Looks like the other items were answered already. You might be intrested in looking at Authen::Captcha for the first item. Looks like it would provide what you'd need.
Best SPAM prevention method: Web App's
by Anonymous Monk on May 01, 2007 at 18:36 UTC
    Title suppose to be "Best SPAM prevention method: Web App's"
    --------------------
    I have RoboForm and am submitting my sites to directories, thus having the title field being my site title. :) --
      I actually thought the irony was intentional.