One other alternative... Get OpenOffice. Open your Word documents using OpenOffice. Save them as html files. OpenOffice seems to do a much better job of keeping html files free from junk tags. I believe OpenOffice is also much easier to automate and it has a powerful and consistent API (haven't tried it, but judging from the docs), so you could do this all automatically from your perl program.
If you do decide to install it, keep in mind that OpenOffice will try to change your file associations for the office documents and it is quite a pain to get them back to the original state.
In reply to Re: Dealing with Word Compact HTML
by relax99
in thread Dealing with Word Compact HTML
by apessos
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |