Well said.
This probably won't be any use, but here it goes anyway: pdftotext (part of the xpdf pdf viewer) can programmatically convert pdf to "formatted" txt. All it takes is
It approximates the original layout by inserting spaces in the txt.
As you need HTML, you're probably better off with pdf2svg, this is just a note in case pdf2svg fails or whatever.