in reply to Re: The best library for reading PDF
in thread The best library for reading PDF

This node falls below the community's threshold of quality. You may see it by logging in.

Replies are listed 'Best First'.
Re^3: The best library for reading PDF
by almut (Canon) on Mar 15, 2010 at 16:25 UTC
    ... And with TEXT I mean TEXT ...

    Please be aware that even if you only want to extract text, there are still a few issues associated with it that can't be solved in principle.  One of them is the glyph-to-character reverse mapping problem.  For a demo, see this pdf. Although it contains nothing but plain text (i.e. glyphs representing characters belonging to the ASCII character set), the text cannot be extracted, although it can be viewed just fine... (try it, for example, with Adobe Reader's "Save as Text" (or try to cut-n-paste selected text), and you'll see what I mean).

    In other words, no library will always be working; and as all libraries I've so far seen have their own specific problems, it's hard to recommend the "perfect" one.  So I'd say just try a few and see for yourself which one works best for the types of PDFs you'll typically be working with.

    A reply falls below the community's threshold of quality. You may see it by logging in.