Re: Image Character Recognition
by hardburn (Abbot) on Dec 04, 2003 at 21:06 UTC
|
You'll notice that if the developers on those sites have done it right, there are various forms of noise in those images (lines in a grid, pseudo-random placement of pixels, etc.). The noise is there specifically to foul OCR software in order to stop people from doing preciely what you're trying to do.
The problem with these methods is that they pretty much stop any blind user from using these web sites, but that's another issue.
---- I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
-- Schemer
: () { :|:& };:
Note: All code is untested, unless otherwise stated
| [reply] [Watch: Dir/Any] [d/l] |
|
The problem with these methods is that they pretty much stop any blind user from using these web sites, but that's another issue.
A good way to handle that issue is to also offer a sound byte that reads out the word. This is also hard for automated scripts to bypass, and if a blind person is using a computer you can pretty much guarantee that they have sound.
I don't know of any websites that actually implement both methods though. I guess the issue with implementation is that the graphic image is easy to randomly generate using software, but the sound byte is much more challenging to generate!
Anyway, this is getting off topic, so I will stop here...
| [reply] [Watch: Dir/Any] |
|
| [reply] [Watch: Dir/Any] |
|
Re: Image Character Recognition
by holo (Monk) on Dec 04, 2003 at 20:51 UTC
|
| [reply] [Watch: Dir/Any] |
Re: Image Character Recognition
by elusion (Curate) on Dec 04, 2003 at 21:55 UTC
|
jcwren wrote a program that does this, though it doesn't work around noise or zig zag lines. It's an interesting program and well worth a look: A little fun with merlyn.
| [reply] [Watch: Dir/Any] |
Re: Image Character Recognition
by Coruscate (Sexton) on Dec 05, 2003 at 06:58 UTC
|
I would just like to make sure of one point. Have you read the terms of use of the site you are trying to do this to? They put these things out there for a reason: they don't want you to automate tasks on their websites (usually just for registrations or to unlock an "abused account" (too many successive failed logins -> yahoo does this)). Just make sure you're being a "good customer". Don't be going and looking for trouble :)
| [reply] [Watch: Dir/Any] |
Re: Image Character Recognition
by Roger (Parson) on Dec 05, 2003 at 05:04 UTC
|
Funny I was actually thinking about doing something like this the other day. I will write down what I think is a strategy to do 'intelligent' character recognition. I am quite confident that this technique is able to recognise printed characters in the noisy-ish images generated by sites like PayPal.
Core Component
Multi-layer, back propagating neural network. I will probably use the AI::NeuralNet::BackProp module to do this. The neural-net is then pre-trained with the fonts to be recognised.
Image Processing
You will definitely need to clean up the image somehow before feeding into the character recognition engine. The pre-processing would involve:
- color image -> black/white convertion (to simplify the recognition)
- noise reduction, including lines that run across the image
- a pixel density count, statistics collection, determine text/character boundary
Character Recognition Process
- Input is an array of character bitmaps captured by the pre-processing steps
- Feed the bitmap into the neural network, and get the best estimate of the character it contains
- Output the characters recognised
| [reply] [Watch: Dir/Any] |
Re: Image Character Recognition
by EvdB (Deacon) on Dec 05, 2003 at 08:41 UTC
|
This should be fairly easy to achieve:
- By looking at several images extract a sample of each character and save these image files locally.
- Take the images that you want to 'read' and apply a fourier transform on it.
- For each character you extracted in 1 do a convolution of its fourier transform with the fourier transform from 2.
- If a maxima occurs note its position.
- After every charater has been tried list them in the order they appear in the image.
Voila - you have the letters!
PS you should check the maths here but I believe that it is roughtly right - long time since I did this at Uni.
--tidiness is the memory loss of environmental mnemonics
| [reply] [Watch: Dir/Any] [d/l] |
Re: Image Character Recognition
by zentara (Archbishop) on Dec 05, 2003 at 16:31 UTC
|
Try ocrad
It takes a pbm, but it seems to work pretty good, at least
for it's examples. :-) | [reply] [Watch: Dir/Any] |
Re: Image Character Recognition
by dragonchild (Archbishop) on Dec 05, 2003 at 13:29 UTC
|
Try A little fun with merlyn. jcwren plays around with something similar.
------
We are the carpenters and bricklayers of the Information Age.
Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.
| [reply] [Watch: Dir/Any] |