Re^2: How to Extract PDF tables using Perl

Td means table data like in html and "Tj" is the cell's text???

The PDF you are parsing seems to have preserved semantic information, I suppose this approach depends on the way it was generated.

I doubt this is generally true. (?)

Cheers Rolf
_{(addicted to the Perl Programming Language and ☆☆☆☆ :)

Je suis Charlie!}

Comment on Re^2: How to Extract PDF tables using Perl

Replies are listed 'Best First'.
Re^3: How to Extract PDF tables using Perl by ablanke (Monsignor) on May 27, 2016 at 13:38 UTC
Yes, that's right. It always depends on the way the PDF was generated. (some PDF tools even position every single character) Maybe the `getPageContentTree` method helps to build a more generally solution. The example based on the solution i've seen.	[reply] [d/l]
Re^4: How to Extract PDF tables using Perl by LanX (Saint) on May 27, 2016 at 15:51 UTC
Thanks that's interesting ... I'll give it a try next time I need to parse PDF. :) Cheers Rolf _{(addicted to the Perl Programming Language and ☆☆☆☆ :) Je suis Charlie!}	[reply]