in reply to Re^2: The best library for reading PDF
in thread The best library for reading PDF
... And with TEXT I mean TEXT ...
Please be aware that even if you only want to extract text, there are still a few issues associated with it that can't be solved in principle. One of them is the glyph-to-character reverse mapping problem. For a demo, see this pdf. Although it contains nothing but plain text (i.e. glyphs representing characters belonging to the ASCII character set), the text cannot be extracted, although it can be viewed just fine... (try it, for example, with Adobe Reader's "Save as Text" (or try to cut-n-paste selected text), and you'll see what I mean).
In other words, no library will always be working; and as all libraries I've so far seen have their own specific problems, it's hard to recommend the "perfect" one. So I'd say just try a few and see for yourself which one works best for the types of PDFs you'll typically be working with.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
| A reply falls below the community's threshold of quality. You may see it by logging in. |