in reply to Re: Robust Anti-Swear script
in thread Robust Anti-Swear script

Yeah, that was something like my second idea. If I washed the text, it would be a bit easier to match against a wordlist.
I don't really need to look for case though as //i gets rid of that. Also, I just thought of cases like multiple letters being substituted for one. Like "Ph" for "f." Would (ph|f) work on that for a word with the letter f in it? OR tr/(ph)/f/ ?
Hopefully not to many people have 45 sweaters. ;)

Replies are listed 'Best First'.
Re: Re: Re: Robust Anti-Swear script
by elwarren (Priest) on Jul 31, 2001 at 01:30 UTC
    You just need a bigger wordlist. Since you're already going to need a list of bad words, why not put your leet speak words in there. Then you'll have fewer false matches on translated words like 45sweaters.

    Anything that is suspect (isn't found in wordlist but contains numbers) could then be translated and run through a second matching.

    It's all in your approach.

      That wouldn't stop people typing l i k e  t h i s, or t*h*i*s or ASCII swearwords. Besides, you'll never keep up with all the different ways of using letters to come up with nasty words, or swearing in foreign languages, or highly offensive phrases that don't use swearwords, or any combination of the above or or or...

      Using a word list (and listing all the bad words you can think of) is not the right way to go. It's pointless to try and keep up with the amount of cr*p people can spew out; it'll always overtake you in the end. Aim to filter out most of it, and leave the rest to real people.

        You are correct, it is pointless to attempt to keep up with people that will try to avoid a system. But, that is the entire point of his project and not the part of our puzzle here.

        If he added the words "t h i s" to his wordlist, they would match, without the overhead of running every word through thousands of permutations. Point being that whatever algorithm he comes up with could be applied to the wordlist ahead of time, thus removing overhead during execution.

        But I agree that people will always find a way around these systems, even if it's only for the pure fun of being better than the system. Everyone else in this thread has made the point already, I won't waste anymore keystrokes reiterating it.