in reply to Re^2: How to Extract PDF tables using Perl
in thread How to Extract PDF tables using Perl

But my People say its possible. And they have done it.

Ask them how they did it and then do it that way. Problem solved.

  • Comment on Re^3: How to Extract PDF tables using Perl

Replies are listed 'Best First'.
Re^4: How to Extract PDF tables using Perl
by MidLifeXis (Monsignor) on May 11, 2016 at 11:07 UTC

    And then wrap that "how" up into a CPAN module. :-)

    --MidLifeXis

      Possibly if there was any Perl module to replace pdftohtml -xml , ie to get character clusters by positions.

      Then one could try to combine histograms of word positions with further user hints like font or area of position.

      Not sure if CAM::PDF can be used to get word positions, the docs only mention "objects", whatever that means...

      Cheers Rolf
      (addicted to the Perl Programming Language and ☆☆☆☆ :)
      Je suis Charlie!