Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

One of the scripts I am working on for my site will be taking stories people have writen in microsoft word (or another text eidt like it) and displaying it on my web page. I have two majore issues. I am not sure how to get all the formating and what not to display (as of right now all the formating is just displayed as garbage) and the other is how to get to wrap when needed inside the space give. You can see the whole issue at http://www.ffinfo.com/fanfic/ffviii/story.pl?The%20Story%20After%20the%20Story Right now I am using just a simple open FILE print <FILE> system as I have never dealt with reading text and displaying. So any help would be great.

Replies are listed 'Best First'.
Re: Displaying text files with perl.
by moritz (Cardinal) on Aug 25, 2008 at 18:57 UTC
    Microsoft Word is not a text editor, and the resulting .doc files aren't text files, they are, well, word documents. You first have to find a tool that extracts the text portions out of a word file.

    There should be quite many such tools out there that do that, with varying quality, speed, reliability and price.

Re: Displaying text files with perl.
by jethro (Monsignor) on Aug 25, 2008 at 19:24 UTC
Re: Displaying text files with perl.
by Zenshai (Sexton) on Aug 25, 2008 at 19:31 UTC
    Here's a link to a script I've used before to start me off on something similar to what you're doing. That is: converting word docs to plain text. For wrapping use one of the aforementioned modules, or try css - its probably easier that way.

    Hope it helps. Good Luck.
Re: Displaying text files with perl.
by dHarry (Abbot) on Aug 25, 2008 at 18:47 UTC

    For the first part I think I don't have a freaking idea what you mean. Do you mean how to get ridd of the formatting or do you want to reproduce the formatting on your web page?

    For the second part: the wrap: take a look at Text::Wrap (simple and good) or take a look at Text::Reflow which is more complex but richer in functionality. I use both and I am quite happy with them.

    Hope this helps.

      antiword

      Antiword is a free MS-Word reader for Linux, RISC OS, and DOS. It converts the documents from Word 2, 6, 7, 97, 2000, 2002, and 2003 to text, Postscript, and XML/DocBook. Antiword tries to keep the layout of the document intact.

Re: Displaying text files with perl.
by Illuminatus (Curate) on Aug 25, 2008 at 20:11 UTC
    Your question is really too broad to answer effectively. As has been previously mentioned, programs like MSWord (or OpenOffice) create and manage 'documents'. 'documents' can be quite complex, including tables, graphs, diagrams, headers/footers, sub/superscripts, etc, etc, etc...
    You have a couple of options here. Both MSWord and OpenOffice allow you to save documents as web pages. This often includes the caveat that 'some formatting may be lost'. It will then depend on just how complicated you want to allow displayed documents to be. Also, both aforementioned programs support saving to RTF, and there is a CPAN module for processing RTF files. You could probably set up your 'upload' page the ability to choose either an html page, an RTF file, or a plain-text file for processing, if you want to be broad.