... And with TEXT I mean TEXT ...

Please be aware that even if you only want to extract text, there are still a few issues associated with it that can't be solved in principle.  One of them is the glyph-to-character reverse mapping problem.  For a demo, see this pdf. Although it contains nothing but plain text (i.e. glyphs representing characters belonging to the ASCII character set), the text cannot be extracted, although it can be viewed just fine... (try it, for example, with Adobe Reader's "Save as Text" (or try to cut-n-paste selected text), and you'll see what I mean).

In other words, no library will always be working; and as all libraries I've so far seen have their own specific problems, it's hard to recommend the "perfect" one.  So I'd say just try a few and see for yourself which one works best for the types of PDFs you'll typically be working with.


In reply to Re^3: The best library for reading PDF by almut
in thread The best library for reading PDF by Mechanizator

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.