Re: Robust Anti-Swear script

A couple of things to think about...

First, people *will* find ways around whatever filters you put in place. If you just want to stop the worst of the nasty words from showing up, you'll probably do OK at that, but your users will undoubtedly come up with new ways to say the same thing that won't be caught by your filter. What are you really trying to accomplish here? If it's a "letter of the law" situation, you've got half a chance. If you're trying to stop participants from communicating naughty ideas, you will fail.

Then, to make matters worse, the more words and permutations you try to filter out, the more false negatives you'll catch. Context is everything -- several years ago, AOL drew some bad publicity when breast cancer survivors were repeatedly dinged for using bad language in chat rooms and user profiles. That's one example. George Carlin sets forth several more in his "Seven Dirty Words" bit, like "You can prick your finger, but don't finger your prick" The more questionable words you try to block, the more legitimate conversation you will block unintentionally. And the more clever your users will get in their attempts to sidestep your bot.

Automating analysis of the English language is not something that can be done with a few perl regexps. Work on your bot, sure, but consider using it to alert a human who can read the questionable content in context and take action, rather than having the bot take action all by itself.

Comment on Re: Robust Anti-Swear script

Replies are listed 'Best First'.
(tye)Re: Robust Anti-Swear script by tye (Sage) on Jul 31, 2001 at 01:51 UTC
And once users realize that they are being filtered, some of them will find it a challange to get past the filter and so the filter may actually result in an increase in offensive chatting. I recall a case where the source of the filter was identified and chat like "Bob, that is a big, stinky pile of Azhrarn and you know it" started showing up. So one should probably make the filter very specific and rather simple so that it isn't much of an interesting challenge to get past and just catches the casual first use of a forbidden word as a reminder that such is not appreciated. This will catch the occasional "slip" by a well-meaning user while being less likely to inflame much of a response to the filter itself. Though even that has a good chance of being more of a problem than it is worth. - tye (but my friends call me "Tye")	[reply]
Re: (tye)Re: Robust Anti-Swear script by Xxaxx (Monk) on Aug 02, 2001 at 23:57 UTC
Tye has some very good points. I run a couple of forums on which I've implemented a simple word list filter. I find that I've backed off to just a few very obvious words. If one of these words show in text I replace with the Times Newspaper approved sustitute and also send myself a message. Usually the substitution is enough to handle the immediate situation. If when the human (or semi-human) moi reviews the alerts a further action is required then further action is taken. But at this point the permutations of okay versus not okay are too complex for a simple bot to handle. I've found that the simple substitution has sent a gentle message that certain language is not approved and most folks have either backed off or found a new and less offensive means of responding to each other. In the few cases where the user was obviously way over the line I chose to implement a ban filter on that user. This was decided and implemented by the semi-human. Claude	[reply]