I don't know of an implementation that is specific to 'dirty' words but spam filters that rely on the frequency and occurance patterns of certain words phrases and patterns are possibly a good model to use. Think of the data / documents that you are working with in the same way that an e-mail filter deals with mail.
Some documents containing the really nasty words can be blacklisted immediately. Some documents can set aside for review if they contain some words or derivations of the rude words that might warrant attention. The rest will be let through. The decision making is based on patterns and frequency rather than individual words.
The problem that you face is that (and I would love to be corrected) I haven't seen an implementation along these lines. You will also cope with updating the patterns, words and rules to cope with different situations that people find offensive. The filter that was used for children's content would annoy adults.
One implementation that I have seen used a search engine and a number of complex stored queries that represented rude words. Documents that ranked highly removed immediately. This approach could be adopted fairly easily since search engine software is readily available. Th trick is to assemble the queries to run against the documents.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.