The definition of "CAPTCHA" is not "text that's been put into an image and made hard to OCR". It's a problem that's hard for a computer, but easy for a human, to solve. There are a lot of things that meet the criteria. The real challenge is to creat a practical CAPTCHA, which meets all of the following:

  1. Impractical for average computing resources to solve ("hard enough")
  2. Simple for almost any human in your target audience to solve correctly
  3. Has solutions which can be easily and automatically checked for correctness

The last two are the hard things. Leaving aside multi-lingual concerns, you still have cultural and experiencial differences. If, for example, you have an interrogation that deals with any kind of conceptual or qualitative measure, you're going to have people who can't pass because they disagree. It's a similar problem to writing good test questions.

The final item isn't hard, but it is limiting. You need to keep your answers simple enough that they can be reliably tested for correctness -- commonly, this means looking up the answer in a database. Right now, the two classes of captcha that work reasonably well for this are the single-word response (like the usual "obfuscated text" captchas) and the multiple-choice test.

The former can be defeated with OCR technology, and the latter can be trivially brute-forced.

There are actually audio-based captchas out there, where if you have a blind user, they can click on an accessible link and recieve an audio clip of the word/letters to type in. Unfortunately, speech-recognition software is getting pretty good, too.

Of course, there are other ways to solve the authentication problem. These have varying degrees of practicality depending on your applicaition. Some examples:

  1. Provide some sort of physical authenticator: for example, send a postal letter containing the initial password for the new account. This isn't practical for free or cheap services, or where your customers will leave if they have to wait a few days to get an account.
  2. Use phone verification. Paypal used to, for certain things, have an automated system call the phone number you provided and ask you for validation (touch-tone enter the last 4 digits of your bank account, for example). They warned you about it on the enrollment form, and they called within a couple of minutes. This is more expensive than the above, but much faster.
  3. Use another messaging technology. Send an SMS message with a verification code to the user's mobile phone: never send to the same phone twice in a given amount of time (one month?). Google does something like this (you must sign up to Gmail by referral or by using a mobile phone). It's concievable that you could use a common IM client as well, though that is much more open to bot abuse.
  4. Use referral: no one can join unless someone invites them. This can be problematic for sites that value disparate membership, or that are targetted to the public at large. Especially if you're trying to attract paying members, this is difficult.
  5. Use approval escalation. On message boards and the like, require that a new account's posts be approved until they've had d days and p posts without submitting any spam. Of course, this is either expensive or requires a pool of volunteer labor, which in turn requires a lot of management time. It also doesn't stop bots from consuming resources, since they still submit the spam and store it in your database (at least for a while).

I guess the point of all this isn't "how do we make universally-acceptable captchas", but understanding how to solve the general "prove you're a person" problem. It's not trivial, and you'll have to either invest some time and/or money in the problem, or accept a failure rate.

<radiant.matrix>
A collection of thoughts and links from the minds of geeks
The Code that can be seen is not the true Code
I haven't found a problem yet that can't be solved by a well-placed trebuchet

In reply to Re: If CAPTCHA isn't the answer. What is? by radiantmatrix
in thread If CAPTCHA isn't the answer. What is? by BMaximus

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.