in reply to PDF::OCR2 results not what I was hoping for
If you're trying OCR on a form, I think the best approach is to pre-segment the different areas where text appears. I found multi-column (or in your case, even multi-box) text to be highly confusing for the OCR programs I tried.
As what you have is basically a form with more or less fixed offsets, I would try to extract the rectangle within which date/time/location appear and then do OCR on these strings. Also look into the settings of your OCR to find whether you can specify a sans-serif font.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: PDF::OCR2 results not what I was hoping for
by nysus (Parson) on Feb 08, 2016 at 16:55 UTC | |
by Corion (Patriarch) on Feb 08, 2016 at 17:02 UTC | |
by nysus (Parson) on Feb 08, 2016 at 18:35 UTC | |
by nysus (Parson) on Feb 08, 2016 at 18:17 UTC | |
|
Re^2: PDF::OCR2 results not what I was hoping for
by nysus (Parson) on Feb 08, 2016 at 17:03 UTC |