Re: OCR Code
by sauoq (Abbot) on Dec 11, 2003 at 19:54 UTC
|
Somehow i need to generate OCR code
So, after piecing together all of your posts in this thread, it seems you want to be able to create a PDF document that uses a font, such as OCR-A or OCR-B, that is designed for reading by OCR systems.
You don't explain how you generate your PDF now. If you are using PDF::API2, I think it allows you to use any TrueType Font you want. If you aren't using it, I suggest you review it. I've not used it much myself (I've played a bit), but I've heard good things from people who have. There are some examples available at http://www.penguin.at0.net/~fredo/files/ which you may find helpful.
-sauoq
"My two cents aren't worth a dime.";
| [reply] |
|
|
I am using same PDF::API2.
I also look at http://www.penguin.at0.net/~fredo/files .
i did not find any example on OCR font.
or how to change font.
Do u have any example.
thanks
Tejas
| [reply] |
|
|
Can you show us your code that uses PDF::API2?
It should be a simple matter of changing or adding a font definition to your code.
You should have a font definition in you want to look something like:
my $doc = PDF::API2->new( -file => 'mypdf.pdf' );
my $page = $doc->page;
my $txt = $page->text;
my $font = $doc->corefont('OCR A',1); # this font name may be incorrec
+t..
$font->encode('latin1');
$txt->font($font,10);
$txt->lead(10);
$txt->text('Print to PDF file in the font set above');
# ....
| [reply] [d/l] |
|
|
A reply falls below the community's threshold of quality. You may see it by logging in. |
Re: OCR Code
by davido (Cardinal) on Dec 11, 2003 at 18:18 UTC
|
There have been several discussions on OCR (Optical Character Recognition) in the Monastery that I can think of. You might find some of them helpful.
jcwren wrote a little snippet that he posted here which performs very basic OCR on an image file. It's quite limited in its capability, but demonstrates some of the principles involved.
The gyst of other threads on the subject seems to be that OCR is a computational-intensive task, and probably better suited to a compiled language.
The fact that there are no modules on CPAN for OCR (at least none that my quick search turned up) would seem to indicate that it's no trivial task.
Update: There is an Image Processing Algorithms module on CPAN that you'll find here which proports to be "designed for solving image analysis and object recognition tasks in Perl." The Quantum Leap of developing an OCR project from scratch may be reduced to an Enormous Hurdle with the aid of this IPA module.
| [reply] |
|
|
I look at the example scripts.This is somthing similar i want.
Below is my requirement.
Take a number from text file, convert it to an OCR font, put it in a PDF file, print it out pdf file, run the printout through an OCR scanner, and give your number back to you.
Do u have nay example ?
thanks Tejas
| [reply] |
Re: OCR Code
by allolex (Curate) on Dec 11, 2003 at 18:12 UTC
|
Hi. "OCR" stands for 'Optical Character Recognition' and is a method for converting a text *image* into machine-readable text (TIFF to ASCII, for example). What exactly are you trying to do with OCR?
| [reply] |
|
|
Hi Allolex,
for example, I want to convert below number into OCR font.
000000378374949304847374074740008484
We can print PDF file and pass that printed page through OCR reader then it will read this number.
Let me know you need anything else.
Thanks
Tejas
| [reply] |
|
|
(Why did someone waste her/his time downvoting the parent? His reply was fine.)
I think I get it now. sauoq seems to have gotten right to the heart of the matter. You're generating a PDF file that is supposed to have a code in a special font designed to be read by OCR systems. (I parsed your mentioning of "OCR code" as "the source code for a program that does OCR".) I guess it's just a question of you finding out how to change the font in your output, really. Unfortunately, I don't know anything about that, but I felt you deserved at least a response.
| [reply] [d/l] |
Re: OCR Code
by CombatSquirrel (Hermit) on Dec 11, 2003 at 18:13 UTC
|
Your question is as unclear as can be. Be more precise: What input format do you have, what would you like to get from it? Why do you need OCR (Optical Character Recognition)? Have you looked at CPAN yet, especially at the PDF modules?
CombatSquirrel.
Entropy is the tendency of everything going to hell. | [reply] |
|
|
Suppose, i have some number into text file.Input font is arial . I want convert that number using OCR font.
So if print that page and pass that through OCR reader it will read that number.
Let me know if you need anything else.
thanks
Tejas
| [reply] |
|
|
You won't need an OCR if you have the number in a text file or in a PDF file.
In the case of the text file, a simple Perl program will suffice, in the case of the PDF file, you will have to resort to one of the PDF modules, which I mentined above, but again: you will not need OCR in this case.
If this does not help you, I would suggest you to read How (Not) To Ask A Question and then restate qour question in a way which is better to understand as a reply to your original post.
Cheers,
CombatSquirrel.
Entropy is the tendency of everything going to hell.
| [reply] |
A reply falls below the community's threshold of quality. You may see it by logging in.
|
Re: OCR Code
by traveler (Parson) on Dec 11, 2003 at 19:05 UTC
|
Maybe what you want to do is use TrueType "OCR A" or "OCR B" fonts.
--traveler | [reply] |
|
|
I know OCR_A and OCR_B fonts.
But how i can use that.
do you have nay example ??
Thanks
Tejas
| [reply] |
|
|
<big_grin>
I think that first you should Ask the Dark Gods. Certanly that will help.
Make sure to ask them exactly what you've asked us: Take a number from you, convert it to an OCR font, put it in a PDF file, print it out, run the printout through an OCR scanner, and give your number back to you.
If the Dark Gods can't help, purchase some good OCR software and PDF authoring software. It's too much work to reinvent that wheel from scratch.
</big_grin>
Seriously though, the problems are that your specification is unclear, and Perl isn't the best tool for this particular job. OCR interpretation is a "Big Job", and most of the companies that specialize in it charge for their software. ...and I've yet to see flawless OCR that can handle more than a very limited type of input.
Some suggestions have been offered. But honestly, I think you're probably either going about it the wrong way, or trying to crack too big of a nut, or both. If serious OCR conversion is your goal, you're better off paying a couple hundred dollars for a well-developed tool. As mentioned before, the fact that there is virtually nothing on CPAN for OCR (and yet OCR is such a well-known task), should be a strong indicator that in this case, home-made Perl scripting isn't your solution. But maybe I've simply not understood what you're trying to accomplish.
Update: <big_grin> tags added.
| [reply] |
A reply falls below the community's threshold of quality. You may see it by logging in.
|
|
|
I have an example or two, but they would probably be useless to you. How are you creating the PDF? How are you specifying the font now? Are you using perl to control MSWord and writing to a pdf writer, creating the pdf "directly" from perl or some other way?
--traveler
| [reply] |