Greetings, brothers and sisters.

I'm with a very complicated problem and I was hoping that Perl can give me a hand.

I've seen a lot of nodes about MSWord files, but all of them seem to point to Win32::OLE wich is a great module, but I don't believe will help me, since I need - at least if possible - to do this in a linux server.

Here's the scenario:

It's basically a file server, wich has to control the documents published on it using a web interface.

So, when a user submits a document (tipically a MSWord file), the application needs to append a customized header to this file. Please note that by "header" I mean a MSWord document header, basically a table with some information, company logo, etc...

Problem #1: How to manipulate MSWord files without "opening" MSWord via Win32::OLE?

Then, when a user clicks on the file link, it should be able to view it in the MSWord-MSIE integrated interface BUT it shouldn't be able to change the file.

The only way the user could change the contents of the document is after downloading it. After the changes the user should then re-submit the document to the system, wich will then append a new header to it and so on...

Problem #2: How to prevent a MSWord document opened in the MSWord-MSIE integrated interface to be changed?

A possible solution to both problems is to convert the contents of the file to another format, like HTML, and let the user view the contents directly in MSIE.

The .doc file would only be available for the user via download to make the changes.

This solution has a problem, however: the conversion must be perfect.

Is there a module that can make a ***decent*** doc -> html conversion? By decent I mean preserving tables, inline images, etc...?

As you can see I'm pretty much lost here, so I was hoping that someone could give me some advices/alternatives, etc...

TIA,


In reply to Manipulating MSWord files in a linux box? by DaWolf

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.