•Re: Another way to get around automated bots
by merlyn (Sage) on May 05, 2004 at 14:18 UTC
|
And again, this is a technical solution that has legal ramifications preventing its use for all but tiny toy sites.
I also don't see the advantage to just using a simple image. For one, you just punished a dialup user pretty badly.
| [reply] |
|
| [reply] |
|
Yes, dealing with the various disabled users is tough because in order for them to make use of it, they need the computer to see it (and then be spoken in the case of someone that is blind - or sight limited or whatever the current PC terms are), and if the computer can see it - then any bot can see it.
I guess I should consider myself fortunate that I don't ever have to program for that - every application I have written has been for an environment where sight is assumed.
The advantage over using an image is that it can't be scanned by a bot and read - if you have an image on a page, the bot can look for the image and then pull the data from the image (easiest way is using neural net training - well, I guess not "easiest" but most effective for varied image types).
But if there is no image there, then that particular bot can't find anything. The text is also not on the page, so it can't find anything either. The bot then has to parse the appropriate content on the page (which is easy if it is the only thing on the page, harder as you add more content and dynamically change how you reference the classes) and rebuild it as an image, and then do the analysis on it.
There are ways of making it much harder for the bot to rebuild it.
Yeah, it is about 10K to represent the same as what a 1K PNG could have done - certainly not ideal for showing images - but this wouldn't be something that you would do on every page either.
That is about a 2 to 3 second download for a 33kbps modem user.
-------------------------------------------------------------------
There are some odd things afoot now, in the Villa Straylight.
| [reply] |
|
I guess I should consider myself fortunate that I don't ever have to program for that - every application I have written has been for an environment where sight is assumed.
I'd be careful about those assumptions (disclaimer: people pay me for accessibility work :-)
For government or government funded sites in the UK, US and in other countries accessibility is a major issue - contractually or legally depending on locale. For business sites it's becoming a potential legal/PR minefield.
The advantage over using an image is that it can't be scanned by a bot and read.
Yes it can. Automating a web browser and a screen grab program isn't hard. With a little more effort they can just parse and interpret the HTML directly.
The question is - is it worth the effort for somebody to do this on your site.
The WAI have a nice working paper on the topic Inaccessibility of Visually-Oriented Anti-Robot Tests for those who are interested in the topic.
Personally I have found heuristic server-side solutions much more effective. For example:
- Require an response from the user via email
- Keep an eye out for registrations coming from the same IP/domain
- Keep an eye out for registrations with similar data
- Feedback forms with "random" names and a tracking ID to make them do a lot more work to automate the submission.
- ... I'm sure you get the idea...
Depending on your application it may be worth thinking how much a captured registration is worth in the currency of your choice, and then thinking about how many registrations a minimum wage worker could make on your site in an hour. If the math comes out the wrong way you're going to have to rethink anyway.
| [reply] |
Re: Another way to get around automated bots
by Fletch (Bishop) on May 05, 2004 at 14:56 UTC
|
Scuttlebutt I've heard is that the really determined ones are copying the image, tossing it up on another site and having a hyumon read it (who then gets free pr0n or what not), and sending the result back. This scheme would be just as vulnerable to something similar.
As a somewhat related aside, I was going to submit something similar to the Obfuscated Perl contest a few years back (but didn't because I used GD and the rules excluded using non-core modules.
| [reply] |
Re: Another way to get around automated bots
by andyf (Pilgrim) on May 17, 2004 at 09:01 UTC
|
I think that's jolly inventive, even if it's not entirely practical. Of course plain ascii art is a similar tactic.
Interestingly I looked at the complementary problem last year for a rabble of grubby greyhat dotcommers in the next office to me - you guessed it, OCR for noisy .gifs (they actually did perfectly legitimate deeplinking searches ).
I used Image::Magik to read, normalise, greyscale, blur and threshold the image, then take the highest weighted sum of the AND with a test image, read nasty brute force OCR.
Eventully they replaced my code with a far faster C++ implementation that finds minimum distances between FFTs of
the images, which quite frankly laughs at Perl (speedwise).
However they still get plenty of problems last time I heard.
That is to say, done properly, obfuscated images can be computationally VERY hard to OCR, but it can be done.
Regardless of methodolgy there is a deeper principle at play here, which connects with what Merlyn has to say... eventually you are going to make life so difficult for
your end user that any perceptual impairment they have
will make reading almost impossible. My (dyslexic) Sister has a damn hard time reading those obfuscated .gifs
My hypothesis then, if you are prepared to throw enough cycles at the problem, with a good enough algorithm, the machine will always be able to filter the info from a noisy image _better_ than a human can. Hence the general method is flawed if its sole objective is to defeat bots.
A better method is to rely on questions from current events news. Make it multiple choice, and make it so that 3 wrong answers out of 5 blocks the IP for an hour.
Even something like
Which dictator has no moustache?
1 Adolf Hitler
2) Augustus Pinochet
3) Saddam Hussain
4) Josef Stalin
5) George W Bush
6) George Palpadopoulos
7) Francois "Papa Doc" Duvalier
would fool pretty much any AI :)
Andy
| [reply] |
|
A better method is to rely on questions from current events news. Make it multiple choice, and make it so that 3 wrong answers out of 5 blocks the IP for an hour.
In these days of proxies using an IP blocking approach is pretty much a dead end. Blocking IPs will mean that you'll kill of groups of people using proxies, and they're so easy to fake only the technically dull bad people will be affected.
Without a blocking mechanism it then just comes down to a question of odds.
I also think you'll be surprised at the high false-negative you'll get with real humans getting the questions wrong :-)
| [reply] |
|
Blocking IPs [...]:, and they're so easy to fake only the technically dull bad people will be affected.
Wow. It is easy for you to fake an IP and have the results sent back to you? You'll have to explain that before I believe you.
If you are using IP for security, then the only risk from faking IPs is that someone can send you data with a forged IP in hopes of getting you to act on it. Simply requiring a minimal dialogue that includes repeating hard-to-predict data is enough to make such extremely unlikely.
An attacker having control over a block of IP adresses is a separate issue.
| [reply] |
|
|
|
| [reply] |
Re: Another way to get around automated bots
by kelan (Deacon) on May 05, 2004 at 14:37 UTC
|
I've played around with something like this before, and indeed I think it's pretty cool, and can be done with any image. The one big downside is that the "size" of the image, in terms of downloading it, is much larger than the image you're replacing. With a gzip enabled server, that might not be so bad however.
On the other hand, although this is a cool hack, I really hope it doesn't catch on. I usually browse the web with images off so I can skip the annoying advertisements that represent probably 80% of web images nowadays. With this technique, there's no good way to turn it off except to disabled displaying divs. And that would probably be a nightmare. I could see unscrupulous advertisers using this technique to get around image blocking and such.
PS. For some fun playing around with this for any random image, download bmp2html. You can modify the source to spit out colored divs, like your program does, instead of colored ASCII characters.
| [reply] |
|
| [reply] |
|
| [reply] |
Re: Another way to get around automated bots
by Anonymous Monk on May 06, 2004 at 07:59 UTC
|
Whatever it is, it doesn't work in Opera 7.23 on Windows... I can't read anything in that, on both your static and dynamic pages. And zooming in doesn't help either. What gives?
| [reply] |
|
Since I don't have Opera on any of my machines, I didn't test it on that (at the end of the write up that the link points to, I note that I only tested it on a limited set of browsers).
The fellow that created the CSS Pencils test also noted to me that it doesn't work in Opera. It is some bug in the way I did the CSS, but it doesn't mean that it won't work at all - just takes some tweaking.
But yes, if your browser won't render the DIVs correctly, then you sure aren't going to see much of anything useful.
-------------------------------------------------------------------
There are some odd things afoot now, in the Villa Straylight.
| [reply] |