in reply to How to parse PDF
Pipe your pdf through the pdftotext tool (on Ubuntu in the poppler-utils package), and see if the output is parsable. That doesn't take very long, you can test it literally in two minutes.
Take a look at PDF::Parse and PDF and see if they help you.
But in principle it is much easier to validate the data before it is put into a PDF - have you tried to ask the external vendor if he could provide the same data in a format that is easier accessible?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: How to parse PDF
by Anonymous Monk on Feb 13, 2009 at 07:35 UTC | |
|
Re^2: How to parse PDF
by Anonymous Monk on Jan 20, 2012 at 19:41 UTC |