in reply to dirty word filter module?
Some documents containing the really nasty words can be blacklisted immediately. Some documents can set aside for review if they contain some words or derivations of the rude words that might warrant attention. The rest will be let through. The decision making is based on patterns and frequency rather than individual words.
The problem that you face is that (and I would love to be corrected) I haven't seen an implementation along these lines. You will also cope with updating the patterns, words and rules to cope with different situations that people find offensive. The filter that was used for children's content would annoy adults.
One implementation that I have seen used a search engine and a number of complex stored queries that represented rude words. Documents that ranked highly removed immediately. This approach could be adopted fairly easily since search engine software is readily available. Th trick is to assemble the queries to run against the documents.
|
|---|