hmag has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I've been searching for a relatively spam-proof version of the Matt's Script Archive WWWBoard recently & after much searching, I found something which looks like it will do the job, while still allowing non-member posts into the message board.

However, there are numerous terms I want to prevent from being used, such as viagra, phetnermine, online poker, casino, porn, etc... etc...

To this end, there's an array within the actual board script to do this that looks like:

@restricted = ('viagra','xxx','gambling');

My question is - would it be possible to include the terms I wish to restrict in a file that the script pulls to reference when a message gets posted? If so, could someone outline how I might do that please - my Perl knowledge is very minimal.

Alternatively, has anyone got a variation of WWWBoard that is relatively bulletproof against spammers & spambots. I was using a version called 'Mr Fong Device', but after adding a number of terms to it's autoban list, it rendered the board inaccessible to posts for everyone - thanks spammers - any help greatly appreciated.

Replies are listed 'Best First'.
Re: Including File in array
by GrandFather (Saint) on Jun 25, 2006 at 21:50 UTC

    Actually for fast lookup you would be better using a hash than an array. Something like this populates a hash from a file (substitute your file handle for DATA):

    use warnings; use strict; my %banned; @banned{do {local $/; split ' ', <DATA>}} = (); print join "\n", keys %banned; my $word = 'porn'; print "\n\n$word is on the banned list\n" if exists $banned{$word}; __DATA__ p0rn viagra phetnermine online poker casino porn

    Prints:

    casino poker phetnermine viagra p0rn porn online porn is on the banned list

    DWIM is Perl's answer to Gödel
Re: Including File in array
by Hue-Bond (Priest) on Jun 25, 2006 at 21:33 UTC
    would it be possible to include the terms I wish to restrict in a file

    This is trivially done with Tie::File:

    $ cat banned_words pr0n hard visit $ perl use Tie::File; tie my @banned_words, 'Tie::File', 'banned_words' or die "tie: $!"; print "banned words: @banned_words\n"; untie @banned_words; __END__ banned words: pr0n hard visit

    --
    David Serrano

Re: Including File in array
by derby (Abbot) on Jun 25, 2006 at 23:27 UTC

    I would just add that when looking to replace MSA stuff, your first stop should be NMS -- their wwwboard provides the banned-word-from-file functionality ... but davorg is probably better suited to speak on it.

    -derby
Re: Including File in array
by TedPride (Priest) on Jun 25, 2006 at 23:14 UTC
    Thing is, some of the words you're going to want to block if they're exact matches, and others if a word starts with them, and others still of a word contains them anywhere. What you really need is a series of regexes and perhaps a scoring system:
    use strict; use warnings; my @ban = ( ['bleep\w*', 1], ['bloop\w*', 1], ['blark\w*', 2], ['blank', 1], ); my $text = join '', <DATA>; print check($text); sub check { my ($score, $word) = 0; $_ = lc($_[0]); s/\W+/ /g; for $word (@ban) { $score += m/\b$word->[0]\b/ * $word->[1]; } return $score; } __DATA__ Bleeping blooper! Blark you! Blank!
    You will probably also want to replace any bad words with substitutes (maybe CENSORED) if the score is more than 0 but less than whatever your cut-off is, but I'll leave that part up to you.

    My advice, however (as someone who used WWWBoard for years and was constantly trying to devise a system to stop spammers) is to forget trying to solve a problem that's impossible to solve. Instead of blocking posts based on their contents, just add a visual verification system that requires people to type in a code they see in a graphic. So long as the source graphics are your own, and not from some popular system that spammers have already cracked, you should end up reasonably spam-free. Just give people a cookie after the first verification that allows them to post for a few hours without further verifications, and add a system to block bots that try to brute force your verification. This way regular users won't be inconvenienced much, and spam bots won't be able to post at all.

    Of course, there may be a few real people who post obscene things, so you'll still want to replace bad words with CENSORED.

Re: Including File in array
by CountZero (Bishop) on Jun 26, 2006 at 06:13 UTC
    Just forget to try to solve the spam problem with simple solutions. A list based approach will never catch variants such as: ViAGgRa, VVIIAAGGRRAA, V_I_A_G_R_A, V*I*A*G*R*A*, V I A G R A, VxIxAxGxRxA,...

    We tend to forget, but spamming has become a high-tech area. Have a look at The spammers Compendium and despair.

    You would be much better to go for a tool like POPFile or Mail::SpamAssassin to classify your messages before they get allowed on the board.

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

      Hi again, Many thanks guys - I'l have a look at that & see what I can work out. The site this board runs on is non-profit & I want to try & make it as simple for people to post as possible, while still be relatively spam-free. The format of WWWBoard is exactly what I'd like to retain, but the security of the modified 'Mr Fong Device' version has gotten to an unmanageable point now. I might try the NMS version if this other board doesn't work out or if I just can't figure out how to implement what people have suggested above.