Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: How to parse PDF

by moritz (Cardinal)
on Aug 24, 2007 at 07:36 UTC ( [id://634805]=note: print w/replies, xml ) Need Help??


in reply to How to parse PDF

The simple answer is you have to try it.

Pipe your pdf through the pdftotext tool (on Ubuntu in the poppler-utils package), and see if the output is parsable. That doesn't take very long, you can test it literally in two minutes.

Take a look at PDF::Parse and PDF and see if they help you.

But in principle it is much easier to validate the data before it is put into a PDF - have you tried to ask the external vendor if he could provide the same data in a format that is easier accessible?

Replies are listed 'Best First'.
Re^2: How to parse PDF
by Anonymous Monk on Feb 13, 2009 at 07:35 UTC
    Excellent utility, especially when using the -layout command. Well done!
Re^2: How to parse PDF
by Anonymous Monk on Jan 20, 2012 at 19:41 UTC
    Great tip on pdftotext, thank you!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://634805]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (6)
As of 2024-03-28 19:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found