in reply to Content "Censorshop" : Kid friendliness

As to the first and last on your list: Punt. Get somebody else to make a list, then follow it religiously. Get somebody else to point the finger of blame at. There are lists; the most important of which is probably the "seven dirty words" supreme court rulling in the US, and probably similar things in other countries. See also Regex::Common, which includes a regex for testing them.

Non-plaintext text-like things can, for the most part, be scanned by semi-automatic means -- there's PDF parsers on CPAN, msword can be scanned normaly -- strings will show you the text -- though sometimes you will catch cases where there was "bad" language that was later deleted, because word tends to append, at least in "fast save" mode.

Unfornatly, doing images requires hardcore AI. Beyond that, it's completly impossible. For example, a picture of a woman's breasts is acceptable in some countries, in some, it's acceptable if talking about medical things, in some, it's simply unacceptable. For that matter, some places, pictures of women's /faces/ are obscene. Other places, pictures of people at all are considered graven images. The last sort probably don't have computers, though, because they most likely consider them evil.

If you want to do this generaly, you have to moderate by trusted moderaters. Sorry.


Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).

  • Comment on Re: Content "Censorshop" : Kid friendliness

Replies are listed 'Best First'.
Re: Content "Censorshop" : Kid friendliness
by jonadab (Parson) on Sep 20, 2003 at 18:35 UTC

    Absolutely right, you punt.

    The problem is that real content filtering is demonstrably AI-complete, which in layman's terms means computers aren't smart enough to do it. Keyword filtering falls flat on its face: you end up filtering out stuff you don't want to filter out and leaving in stuff that's obviously obscene. Nevertheless, this is the kind of filtering you want to do, because all the other kinds are worse. If possible, involve a human in the process, if only by having any posts with medium-class "gray" words need to be approved by a moderator before being publically viewable. Some words you can get away with just banning entirely, but others (e.g., nipple) are very problematic; you end up blocking conversation about baby bottles and engine mechanics; these you probably want to greylist and pass through a moderator. Also be aware that no matter what words you block, people who want to make sexual inuendo will do so; the only way to fix that is to pass *everything* through a human moderator.

    If you can't punt to human moderators, wheedle, cajole, trick, or coerce someone else into giving you a list of words to block. It's an impossible task to get the list right, and you DO NOT want to have responsibility for the list rest on you. But if you can punt in realtime to human moderator(s), that's better. Blacklist the big bad four-letter words whose only meaning involves excrement or intercourse, and put any other nasty words on a greylist that flags the post for a moderator to examine and approve or disapprove. That way you don't end up blocking conversation about cancer, Mr. Gephardt, and so on. And if you can get the mods to also look at other posts, even if only to spot-check them from time to time, do so, because you WILL have a few idiots who think it's their job to find any dirty words or phrases that your filters ignore and use them in every post.


    $;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}} split//,".rekcah lreP rehtona tsuJ";$\=$ ;->();print$/
Re: Re: Content "Censorshop" : Kid friendliness
by Arbogast (Monk) on Sep 20, 2003 at 15:31 UTC
    No doubt a computer could filter out a list of certain words. I can't imagine how any site with public postings can be made "family friendly", without a human monitoring each photo or post. No doubt whatever standard you uphold, you will offend someone. I would wager if you check back in a couple decades a computer could filter some photos.

    Also, the most offensive words often have no foul language. One can easily imagine very sexually explicit language using non sexual normal objects. Likewise to take an example from Mark Twain, I believe many people would find this poem extremely offensive. Others would consider it educational for youngsters. Yet, how could a computer determine it was offensive when humans can't agree??? Perhaps only by filtering a reference to God (since any content mentioning God will offend some sect?)

    http://www.lone-star.net/mall/literature/warpray.htm