in reply to pdf to html

I always used pdftohtml:

http://pdftohtml.sourceforge.net/

Then I parsed the HTML for content with HTML::TreeBuilder::XPath. This works particularly well for simple documents, or documents with a standardized structure. You can look for the x/y offset of the element to find the exact piece of information you're looking for.