Another way to get around automated bots

Replies are listed 'Best First'.
•Re: Another way to get around automated bots by merlyn (Sage) on May 05, 2004 at 14:18 UTC
And again, this is a technical solution that has legal ramifications preventing its use for all but tiny toy sites. I also don't see the advantage to just using a simple image. For one, you just punished a dialup user pretty badly. -- Randal L. Schwartz, Perl hacker Be sure to read my standard disclaimer if this is a reply.	[reply]
Re: •Re: Another way to get around automated bots by davido (Cardinal) on May 05, 2004 at 17:32 UTC
I was amazed the other day when I set up a PayPal account. I found that in order to set up the account I had to repeat back the numbers that I read from a slightly obscured graphic image. Paypal is hardly a "toy" site. My immediate thought was, "Isn't this what merlyn's always talking about?" Unbelievable that they would use such a limiting method to authenticate users. Dave	[reply]
Re: •Re: Another way to get around automated bots by AssFace (Pilgrim) on May 05, 2004 at 14:37 UTC
Yes, dealing with the various disabled users is tough because in order for them to make use of it, they need the computer to see it (and then be spoken in the case of someone that is blind - or sight limited or whatever the current PC terms are), and if the computer can see it - then any bot can see it. I guess I should consider myself fortunate that I don't ever have to program for that - every application I have written has been for an environment where sight is assumed. The advantage over using an image is that it can't be scanned by a bot and read - if you have an image on a page, the bot can look for the image and then pull the data from the image (easiest way is using neural net training - well, I guess not "easiest" but most effective for varied image types). But if there is no image there, then that particular bot can't find anything. The text is also not on the page, so it can't find anything either. The bot then has to parse the appropriate content on the page (which is easy if it is the only thing on the page, harder as you add more content and dynamically change how you reference the classes) and rebuild it as an image, and then do the analysis on it. There are ways of making it much harder for the bot to rebuild it. Yeah, it is about 10K to represent the same as what a 1K PNG could have done - certainly not ideal for showing images - but this wouldn't be something that you would do on every page either. That is about a 2 to 3 second download for a 33kbps modem user. ------------------------------------------------------------------- There are some odd things afoot now, in the Villa Straylight.	[reply]
Re^3: Another way to get around automated bots by adrianh (Chancellor) on May 05, 2004 at 16:30 UTC
I guess I should consider myself fortunate that I don't ever have to program for that - every application I have written has been for an environment where sight is assumed. I'd be careful about those assumptions (disclaimer: people pay me for accessibility work :-) For government or government funded sites in the UK, US and in other countries accessibility is a major issue - contractually or legally depending on locale. For business sites it's becoming a potential legal/PR minefield. The advantage over using an image is that it can't be scanned by a bot and read. Yes it can. Automating a web browser and a screen grab program isn't hard. With a little more effort they can just parse and interpret the HTML directly. The question is - is it worth the effort for somebody to do this on your site. The WAI have a nice working paper on the topic Inaccessibility of Visually-Oriented Anti-Robot Tests for those who are interested in the topic. Personally I have found heuristic server-side solutions much more effective. For example: Require an response from the user via email Keep an eye out for registrations coming from the same IP/domain Keep an eye out for registrations with similar data Feedback forms with "random" names and a tracking ID to make them do a lot more work to automate the submission. ... I'm sure you get the idea... Depending on your application it may be worth thinking how much a captured registration is worth in the currency of your choice, and then thinking about how many registrations a minimum wage worker could make on your site in an hour. If the math comes out the wrong way you're going to have to rethink anyway.	[reply]
Re: Another way to get around automated bots by Fletch (Bishop) on May 05, 2004 at 14:56 UTC
Scuttlebutt I've heard is that the really determined ones are copying the image, tossing it up on another site and having a hyumon read it (who then gets free pr0n or what not), and sending the result back. This scheme would be just as vulnerable to something similar. As a somewhat related aside, I was going to submit something similar to the Obfuscated Perl contest a few years back (but didn't because I used GD and the rules excluded using non-core modules.	[reply]
Re: Another way to get around automated bots by andyf (Pilgrim) on May 17, 2004 at 09:01 UTC
I think that's jolly inventive, even if it's not entirely practical. Of course plain ascii art is a similar tactic. Interestingly I looked at the complementary problem last year for a rabble of grubby greyhat dotcommers in the next office to me - you guessed it, OCR for noisy .gifs (they actually did perfectly legitimate deeplinking searches ). I used Image::Magik to read, normalise, greyscale, blur and threshold the image, then take the highest weighted sum of the AND with a test image, read nasty brute force OCR. Eventully they replaced my code with a far faster C++ implementation that finds minimum distances between FFTs of the images, which quite frankly laughs at Perl (speedwise). However they still get plenty of problems last time I heard. That is to say, done properly, obfuscated images can be computationally VERY hard to OCR, but it can be done. Regardless of methodolgy there is a deeper principle at play here, which connects with what Merlyn has to say... eventually you are going to make life so difficult for your end user that any perceptual impairment they have will make reading almost impossible. My (dyslexic) Sister has a damn hard time reading those obfuscated .gifs My hypothesis then, if you are prepared to throw enough cycles at the problem, with a good enough algorithm, the machine will always be able to filter the info from a noisy image _better_ than a human can. Hence the general method is flawed if its sole objective is to defeat bots. A better method is to rely on questions from current events news. Make it multiple choice, and make it so that 3 wrong answers out of 5 blocks the IP for an hour. Even something like Which dictator has no moustache? 1 Adolf Hitler 2) Augustus Pinochet 3) Saddam Hussain 4) Josef Stalin 5) George W Bush 6) George Palpadopoulos 7) Francois "Papa Doc" Duvalier would fool pretty much any AI :) Andy	[reply]
Re^2: Another way to get around automated bots by adrianh (Chancellor) on May 17, 2004 at 09:31 UTC
A better method is to rely on questions from current events news. Make it multiple choice, and make it so that 3 wrong answers out of 5 blocks the IP for an hour. In these days of proxies using an IP blocking approach is pretty much a dead end. Blocking IPs will mean that you'll kill of groups of people using proxies, and they're so easy to fake only the technically dull bad people will be affected. Without a blocking mechanism it then just comes down to a question of odds. I also think you'll be surprised at the high false-negative you'll get with real humans getting the questions wrong :-)	[reply]
Re^3 : Another way to get around automated bots (fake IP) by tye (Sage) on May 17, 2004 at 14:15 UTC
Blocking IPs [...]:, and they're so easy to fake only the technically dull bad people will be affected. Wow. It is easy for you to fake an IP and have the results sent back to you? You'll have to explain that before I believe you. If you are using IP for security, then the only risk from faking IPs is that someone can send you data with a forged IP in hopes of getting you to act on it. Simply requiring a minimal dialogue that includes repeating hard-to-predict data is enough to make such extremely unlikely. An attacker having control over a block of IP adresses is a separate issue. - tye	[reply]
Re^4 : Another way to get around automated bots (fake IP) by adrianh (Chancellor) on May 17, 2004 at 15:34 UTC
Re: Re^3 : Another way to get around automated bots (fake IP) by andyf (Pilgrim) on May 17, 2004 at 18:43 UTC
Re^2: Another way to get around automated bots by Nkuvu (Priest) on May 17, 2004 at 17:20 UTC
If you do implement that multiple question thing, let me know so I can avoid the website, OK? From your list of seven dictators, there are three names I don't recognize, and four names that I wouldn't be able to associate a face with... Update: Note that this was a flippant response to your flippant example. :)	[reply]
Re: Another way to get around automated bots by kelan (Deacon) on May 05, 2004 at 14:37 UTC
I've played around with something like this before, and indeed I think it's pretty cool, and can be done with any image. The one big downside is that the "size" of the image, in terms of downloading it, is much larger than the image you're replacing. With a gzip enabled server, that might not be so bad however. On the other hand, although this is a cool hack, I really hope it doesn't catch on. I usually browse the web with images off so I can skip the annoying advertisements that represent probably 80% of web images nowadays. With this technique, there's no good way to turn it off except to disabled displaying `div`s. And that would probably be a nightmare. I could see unscrupulous advertisers using this technique to get around image blocking and such. PS. For some fun playing around with this for any random image, download bmp2html. You can modify the source to spit out colored `div`s, like your program does, instead of colored ASCII characters.	[reply]
•Re: Re: Another way to get around automated bots by merlyn (Sage) on May 17, 2004 at 16:54 UTC
`bmp2html` was one of the inspirations for my column that does that using ImageMagick. -- Randal L. Schwartz, Perl hacker Be sure to read my standard disclaimer if this is a reply.	[reply]
Re: •Re: Re: Another way to get around automated bots by kelan (Deacon) on May 17, 2004 at 19:55 UTC
In that article it mentions a sample image, `col53-fig.gif`. Is that available somewhere? I'm curious to see the final output, but I'm too lazy to install ImageMagick and GD :)	[reply]
Re: Another way to get around automated bots by Anonymous Monk on May 06, 2004 at 07:59 UTC
Whatever it is, it doesn't work in Opera 7.23 on Windows... I can't read anything in that, on both your static and dynamic pages. And zooming in doesn't help either. What gives?	[reply]
Re: Re: Another way to get around automated bots by AssFace (Pilgrim) on May 06, 2004 at 12:19 UTC
Since I don't have Opera on any of my machines, I didn't test it on that (at the end of the write up that the link points to, I note that I only tested it on a limited set of browsers). The fellow that created the CSS Pencils test also noted to me that it doesn't work in Opera. It is some bug in the way I did the CSS, but it doesn't mean that it won't work at all - just takes some tweaking. But yes, if your browser won't render the DIVs correctly, then you sure aren't going to see much of anything useful. ------------------------------------------------------------------- There are some odd things afoot now, in the Villa Straylight.	[reply]


Welcome to the Monastery
	PerlMonks