gmpassos has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks!

I'm looking for a way to get some .doc files (MS Word 97+ documents) and convert automatically to HTML! The documents only cant have texts and tables (what is easier do convert).

This is for a management system, where the documents that they need to fill and send to the directors in the main office stay in the internet. So, the user open the html version of the doc in the internet and fill the inputs of the formulary, insted of open in the Word and type every thing. This is good, since when the user send the formulary (HTML), the name and values of the inputs are catched and saved in some DB. In the Word version, the insertion of data in the DB need to be made by hand, reading the printed version.

But why Word? The directors like to write their documents on it, and don't know HTML. We know that you can save a .doc to .html from the Word editor, but we saw that teach that for all the peoples that will write the docs don't work for all. So, we want to get the .doc directly, and convert it internally by the system.

I'm looking for a way to make this convertion automatically, and better if it can work on Linux. But if only an Win32 solution come is fine.

Graciliano M. P.
"The creativity is the expression of the liberty".

  • Comment on There is a way to convert .doc to .html automatically from Perl?

Replies are listed 'Best First'.
(jeffa) Re: There is a way to convert .doc to .html automatically from Perl?
by jeffa (Bishop) on Jun 10, 2003 at 21:01 UTC
    So, the only reason why the directors want to use Word is because they don't want to learn HTML? That's exactly what WYSIWYG editors were created for. Macromedia's Dreamweaver is one of the best commercial products out there, but that ultra cool web browser Mozilla ships with one for free (Composer). If the directors already know how to save and upload Word documents, they shouldn't have too much trouble doing the same with HTML documents created with a tool not unlike Word.

    I have successfully converted Word documents to HTML with wvWare, by the way. There are some nice tools in that library. Also, check out htmlarea. It works wonders if your 'clients' use Internet Exploder. There is a version 3 in the works that will work on other browser (and the Mac), but in the meantime, only IE is guaranteed to work ... and work it does. :)

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    
Re: There is a way to convert .doc to .html automatically from Perl?
by gellyfish (Monsignor) on Jun 10, 2003 at 20:24 UTC

    Well on windows you could automate Word using Win32::OLE to cause Word to output the document as HTML, or indeed if you wanted to get fancier then you could read the elements of the document yourself and then construct the HTML yourself.

    On a Linux machine I would suggest using the program mswordview that can be found at http://wvware.sourceforge.net/ - of course this is not Perl but you can use it from your Perl program

    /J\
    
Re: There is a way to convert .doc to .html automatically from Perl?
by jplindstrom (Monsignor) on Jun 10, 2003 at 21:10 UTC
    Antiword converts to text or postscript. From your description, I can't really tell if that's good enough for you, but give it a try!
    Antiword is a free MS Word reader for Linux and RISC OS. There are ports to BeOS, OS/2, Mac OS X, Amiga, VMS, NetWare, EPOC and DOS. Antiword converts the binary files from Word 2, 6, 7, 97, 2000 and 2002 to plain text and to PostScript TM.

    /J

Re: There is a way to convert .doc to .html automatically from Perl?
by grantm (Parson) on Jun 11, 2003 at 09:32 UTC

    You could waste a lot of time trying to achieve this with free software. If they paid for a word processor package then they might as well pay for a conversion package. I'd recommend R2Net from Logictran. I'm not affiliated with them, but the product and their service is great and very good value. You can automate the process with Perl.

Re: There is a way to convert .doc to .html automatically from Perl?
by cfreak (Chaplain) on Jun 11, 2003 at 13:38 UTC

    I have had a similar problem to tackle recently. The aforementioned wvware is definitely what you want to do. The HTML was pretty good but not perfect. I had better luck in maintaining the look of the document by converting them to PDF. YMMV

    Lobster Aliens Are attacking the world!