http://qs1969.pair.com?node_id=663027

zer has asked for the wisdom of the Perl Monks concerning the following question:

Good evening,

I am trying to read microsoft office documents on a linux platform. I have made working solutions using the Win32 modules but now i dont have any way to get this done.

I have attempted OpenOffice::OODoc which has no problems with the open document standard but no luck with MSWord. If there is anything that will produce the result of an open document format it would help

Thanks

Replies are listed 'Best First'.
Re: Reading Office Documents
by hipowls (Curate) on Jan 18, 2008 at 09:48 UTC

    I once had a similar problem where I needed to save a document in Word format. I ran OpenOffice from perl.

    system qw{/path/to/soffice -invisible}, qq{"macro:///Standard.Converters.SaveAsDoc($(in),$(out))"};
    which invoked the following macro. You'll need to find out what you change "MS Word 97" to, I have to leave something for you to do;)
    REM ***** BASIC ***** Sub SaveAsDoc( inFile, outFile ) inURL = ConvertToURL( inFile ) oDoc = StarDesktop.loadComponentFromURL( inURL, "_blank", 0, (_ Array(MakePropertyValue( "Hidden", True ),)) outURL = ConvertToURL( outFile ) oDoc.storeToURL( outURL, Array(_ MakePropertyValue( "FilterName", "MS Word 97" ),) oDoc.close( True ) End Sub Function MakePropertyValue( Optional cName As String, Optional uValue +) As com.sun.star.beans.PropertyValue Dim oPropertyValue As New com.sun.star.beans.PropertyValue If Not IsMissing( cName ) Then oPropertyValue.Name = cName EndIf If Not IsMissing( uValue ) Then oPropertyValue.Value = uValue EndIf MakePropertyValue() = oPropertyValue End Function

Re: Reading Office Documents
by Tux (Canon) on Jan 18, 2008 at 14:03 UTC

    If you just need the content, without preserving style and layout, you can use something like:

    open my $f_txt, "-|", "antiword $document" or die "..."; while (<$f_txt>) { # read line by line } close $f_txt;

    Enjoy, Have FUN! H.Merijn
Re: Reading Office Documents
by starX (Chaplain) on Jan 18, 2008 at 15:30 UTC
    It might not be the answer you're looking for, but both MS Word and OO have the ability to save a document as RTF, which can be read via RTF::Parser. A quick google search will reveal some batch converters, which may or may not be of any real use.

    And I know this *really* isn't the answer you're looking for, but the source of open office might be modular enough so that you could have perl invoke it to do the job.

    --starX, axisoftime.com