You might want to read Re: CAM::PDF did't extract all pdf's content for some info why it is so difficult to extract text from .pdf-files (in addition to the way of coding bart is assuming).
I your case, I would suggest to try another program (e.g. pdf2txt, or some ocr-software) in parallel and compare the output. In case your program identifies mismatches, you could try to use plausibility-checks and/or dictionary-lookups ... depending on how much effort you want to spend.
HTH, RataIn reply to Re: words are merging while extracting the text from pdf
by Ratazong
in thread words are merging while extracting the text from pdf
by sureshrps
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |