You just need a bigger wordlist. Since you're already going to need a list of bad words, why not put your leet speak words in there. Then you'll have fewer false matches on translated words like 45sweaters.
Anything that is suspect (isn't found in wordlist but contains numbers) could then be translated and run through a second matching.
It's all in your approach. | [reply] |
That wouldn't stop people typing l i k e t h i s, or t*h*i*s or ASCII swearwords. Besides, you'll never keep up with all the different ways of using letters to come up with nasty words, or swearing in foreign languages, or highly offensive phrases that don't use swearwords, or any combination of the above or or or...
Using a word list (and listing all the bad words you can think of) is not the right way to go. It's pointless to try and keep up with the amount of cr*p people can spew out; it'll always overtake you in the end. Aim to filter out most of it, and leave the rest to real people.
| [reply] |
You are correct, it is pointless to attempt to keep up with people that will try to avoid a system. But, that is the entire point of his project and not the part of our puzzle here.
If he added the words "t h i s" to his wordlist, they would match, without the overhead of running every word through thousands of permutations. Point being that whatever algorithm he comes up with could be applied to the wordlist ahead of time, thus removing overhead during execution.
But I agree that people will always find a way around these systems, even if it's only for the pure fun of being better than the system. Everyone else in this thread has made the point already, I won't waste anymore keystrokes reiterating it.
| [reply] |