Microsoft Word is not a text editor, and the resulting .doc files aren't text files, they are, well, word documents. You first have to find a tool that extracts the text portions out of a word file.
There should be quite many such tools out there that do that, with varying quality, speed, reliability and price. | [reply] [d/l] |
| [reply] |
Here's a link to a script I've used before to start me off on something similar to what you're doing. That is: converting word docs to plain text. For wrapping use one of the aforementioned modules, or try css - its probably easier that way.
Hope it helps. Good Luck. | [reply] |
For the first part I think I don't have a freaking idea what you mean. Do you mean how to get ridd of the formatting or do you want to reproduce the formatting on your web page?
For the second part: the wrap: take a look at Text::Wrap (simple and good) or take a look at Text::Reflow which is more complex but richer in functionality. I use both and I am quite happy with them.
Hope this helps.
| [reply] |
antiword
Antiword is a free MS-Word reader for Linux, RISC OS, and DOS. It converts the documents from Word 2, 6, 7, 97, 2000, 2002, and 2003 to text, Postscript, and XML/DocBook. Antiword tries to keep the layout of the document intact.
| [reply] |
Your question is really too broad to answer effectively. As has been previously mentioned, programs like MSWord (or OpenOffice) create and manage 'documents'. 'documents' can be quite complex, including tables, graphs, diagrams, headers/footers, sub/superscripts, etc, etc, etc...
You have a couple of options here. Both MSWord and OpenOffice allow you to save documents as web pages. This often includes the caveat that 'some formatting may be lost'. It will then depend on just how complicated you want to allow displayed documents to be. Also, both aforementioned programs support saving to RTF, and there is a CPAN module for processing RTF files. You could probably set up your 'upload' page the ability to choose either an html page, an RTF file, or a plain-text file for processing, if you want to be broad. | [reply] |