in reply to words are merging while extracting the text from pdf
You might want to read Re: CAM::PDF did't extract all pdf's content for some info why it is so difficult to extract text from .pdf-files (in addition to the way of coding bart is assuming).
I your case, I would suggest to try another program (e.g. pdf2txt, or some ocr-software) in parallel and compare the output. In case your program identifies mismatches, you could try to use plausibility-checks and/or dictionary-lookups ... depending on how much effort you want to spend.
HTH, Rata
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: words are merging while extracting the text from pdf
by elef (Friar) on Jan 04, 2011 at 11:56 UTC | |
|
Re^2: words are merging while extracting the text from pdf
by sureshrps (Novice) on Jan 04, 2011 at 14:41 UTC |