in reply to pdf to html
I always used pdftohtml:
http://pdftohtml.sourceforge.net/Then I parsed the HTML for content with HTML::TreeBuilder::XPath. This works particularly well for simple documents, or documents with a standardized structure. You can look for the x/y offset of the element to find the exact piece of information you're looking for.
|
|---|