in reply to Re^2: PDF::OCR2 results not what I was hoping for
in thread PDF::OCR2 results not what I was hoping for
Reading the documentation of PDF::OCR2, I get the impression that it converts the PDF pages into separate image files using PDF::GetImages and then uses Image::OCR::Tesseract to get the text from the image.
I would change that to add a cropping step in between, which selects only the "interesting" part of the image.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: PDF::OCR2 results not what I was hoping for
by nysus (Parson) on Feb 08, 2016 at 18:35 UTC | |
|
Re^4: PDF::OCR2 results not what I was hoping for
by nysus (Parson) on Feb 08, 2016 at 18:17 UTC |