in reply to Re: How to Extract PDF tables using Perl
in thread How to Extract PDF tables using Perl

:D :D yeah 3016

Thanks for the reply.
The problem here is that the table is dynamic.

So there may be 3 labels or 30 labels like Date,Value1and Value2
or there may be a lot.

Some of them might be undefined.

Are there any modules that might help me Parse a PDF table.??

So Far CAM::PDF and PDF::API2 does not have the feature of reading a table inside a pdf, only Creating a new one.

Main Problem:The values get mixed and printed in a single line,
1.)So Some of these values might not be defined(Just Empty Sets),

And the labels keep changing,So They are not static at all.


Any Advises or Ideas on Modules or How to do it Please..?
  • Comment on Re^2: How to Extract PDF tables using Perl

Replies are listed 'Best First'.
Re^3: How to Extract PDF tables using Perl
by Anonymous Monk on May 25, 2016 at 05:54 UTC

    i use perl but when trying to do something similar, i found using python3 + pdfquery seemed to work easier & did the column parsing...

    http://www.markhneedham.com/blog/2015/01/22/pythonpdfquery-scraping-the-fifa-world-player-of-the-year-votes-pdf-into-shape/

    i guess the nutshell is loop over each page in pdf, search for matching string, if found, get its x,y coordinates, use that result in_bbox(x,y,x2,y2) to scrape whatever else text might be inside this bounding box - because i wanted a "row" my bbox was x,y,x+500,y+10 ( grid origin at bottom left?)

    i don't know how it really works, but i was able to copy/paste enough bits to get what i needed

    maybe pdf::api or something can have similar feature in_bbox? is it maybe like a collision detection logic where given bounding box, find all text thingys that collide with it and return an array of those? i'm guessing out my a##

    sorry if this doesn't help

      doh - one minor note:
      x = float(thing.get('x0')) doesn't work(?) in latest pdfquery - per pdfquery docs use x = float(thing.attr('x0')) - 'get' maybe was replaced with 'attr'
      Thank you so much Anonymous Monk.

      Your Reply is Very Valuable And I hope We Perl Developerswould get a Piece of the Action in Perl of the Same Module.
      Looking Forward to it.
      Peace.