in reply to Re: Win32::OLE not support Ubuntu 14.10
in thread Win32::OLE not support Ubuntu 14.10

Research the file format for word docs and roll your own solution

The "new" DOCX format is a ZIP archive of XML files and some other stuff. One of the XML files contains the document text. Archive::Zip can pack and unpack ZIP archives, XML::LibXML and some other modules can process XML.

The "older" DOC formats are binary blobs. No easy way there, sorry.

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
  • Comment on Re^2: Win32::OLE not support Ubuntu 14.10

Replies are listed 'Best First'.
Re^3: Win32::OLE not support Ubuntu 14.10
by marinersk (Priest) on Jun 12, 2015 at 04:20 UTC

    Without looking up the format, I have to say that .DOCXfiles appear to be a cinch to edit. I just unzipped one, changed into the worddirectory it created, and popped open document.xml.

    While it has the typical Microsoft Word overabundant nonlinear code insertion (I did hack one of the early .DOCformats way back when and remember that observation being applicable), the raw text components seem to be fairly consistently in between <w:t>and </w:t>tags. Before writing production code I'd want to research that a bit, of course, but for a quick-n-dirty word replacement I'd probably be willing to gamble.

    Sample of what I found:

    <w:t xml:space="preserve"> so we can order food. Please include in your RSVP how many people will be attending. </w:t>