in reply to Re: How to Extract PDF tables using Perl
in thread How to Extract PDF tables using Perl

Td means table data like in html and "Tj" is the cell's text???

The PDF you are parsing seems to have preserved semantic information, I suppose this approach depends on the way it was generated.

I doubt this is generally true. (?)

Cheers Rolf
(addicted to the Perl Programming Language and ☆☆☆☆ :)
Je suis Charlie!

  • Comment on Re^2: How to Extract PDF tables using Perl

Replies are listed 'Best First'.
Re^3: How to Extract PDF tables using Perl
by ablanke (Monsignor) on May 27, 2016 at 13:38 UTC
    Yes, that's right.

    It always depends on the way the PDF was generated. (some PDF tools even position every single character)

    Maybe the getPageContentTree method helps to build a more generally solution.

    The example based on the solution i've seen.

      Thanks that's interesting ... I'll give it a try next time I need to parse PDF. :)

      Cheers Rolf
      (addicted to the Perl Programming Language and ☆☆☆☆ :)
      Je suis Charlie!