As has been pointed out in a number of the excellent replies here, there's no reliable automatic way to do it because the information structures of PDF and HTML are incompatible. However, with a little human interaction and intelligence plugged into the system, it can be made to work (although it's not scalable.) 'pdftotext -layout' will extract the text, and 'pdfimages' will get the images. Once you have those, structuring either (or both) into a reasonable HTML approximation is relatively simple - but does require some thought and a little artistic judgement.
In the (narrow, specialized) case where you know that your PDFs are going to be nothing more than plain text, the process could be automated with "pdftotext -layout -htmlmeta file.pdf". This will produce an HTML file with a reasonable header and the content surrounded by 'pre' tags.
In reply to Re: Convert PDF file into HTML file
by oko1
in thread Convert PDF file into HTML file
by DEIVEEGARAJA
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |