in reply to conversion from doc to html
Another option is to try unoconv, which uses OpenOffice or LibreOffice to the actual work. Doing it all yourself would be very much work