Preceptor has asked for the wisdom of the Perl Monks concerning the following question:

Hello. I'm currently looking at re-inventing a change control system. Thus far, I've pretty much decided on a database backend, in which server configs etc. will be stored (along with logs of 'changes').
The problem I have, is that we have a lot of servers. And I'd rather not go through every build sheet, and copy all the items by hand.
Can anyone point me in the right direction to be looking to extract information (IE server config info) from a word document?
Thankfully, the content I am after is inserted into fields on a template, which _shoul_ make grabbing the info easier...
Cheers.
Ed.
--
It's not pessimism if there is a worse option, it's not paranoia when they are and it's not cynicism when you're right.

Replies are listed 'Best First'.
Re: Snarfing data from microsoft word
by t0mas (Priest) on Sep 18, 2002 at 11:28 UTC
    You could use something like this:
    #!/usr/bin/perl -w # Uses use strict; use Win32::OLE; use Win32::OLE::Const; # Create MSWord object and load constants my $MSWord = Win32::OLE->new('Word.Application', 'Quit') or die "Could not load MS Word\n"; my $wd=Win32::OLE::Const->Load($MSWord); # Open document (full path) my $doc = $MSWord->Documents->Open('c:\full\path\to\document.doc'); # Ask word to print the contents of a field named "TheFieldIWant" print $doc->FormFields("TheFieldIWant")->Result,"\n"; # Close document (without save) $doc->Close({SaveChanges=>$wd->{wdDoNotSaveChanges}});


    /brother t0mas
Re: Snarfing data from microsoft word
by alien_life_form (Pilgrim) on Sep 18, 2002 at 11:36 UTC
    Greetings,

    On Win32, you may want to consider:

    use Win32::OLE; my $word=Win32::OLE->new('word.application'); my $doc=$word->Documents->Open('C:\foo\bar.doc'); my $text=$doc->{Text}; # $text=~s/\r/\n/g;
    OK, so this yelds all the text of the document. If the text is inside text boxes (for instance) it will not be in $text.
    But $doc contains the entire DOM, so you can get at every part of the document, and churn it to your heart's content... so to speak. Your mileage will vary according to the complexity and variability of the structure of your document, etc.
    perl -MWin32::OLE -d -e 42
    will be your friend...
    Cheers,
    alf
    You can't have everything: where would you put it?
      I had to use:

      my $text=$doc->{Content}->{Text};
      ...but after that it works like a charm, and just happens to be exactly the code I need today!

      Peace,
      -McD

Re: Snarfing data from microsoft word
by joe++ (Friar) on Sep 18, 2002 at 10:18 UTC
    Hi Ed,

    Way back I used a program which was called "mswordview" those days. It appears they now have changed it's name to wvware (sourceforge). The core program is called "wv" (from "wordview") and there's a library called "libwmf".

    This should allow you to extract text, although I don't think there are any Perl bindings available.

    HTH!

    --
    Cheers, Joe

Re: Snarfing data from microsoft word
by bronto (Priest) on Sep 18, 2002 at 09:55 UTC

    You are on a mined ground, Preceptor

    Word files could have a lot of crap inside. Maybe the best thing you could do is to use a directory server and web based forms to put the data into it.

    I worked on a framework for this kind of things. After having a working version I am rewriting it from scratch, since when I started I didn't know enough OOP and LDAP, and the design wasn't that good

    I'm looking for help for the rewrite ;-) If you are interested we could join forces!

    Ciao!
    --bronto

    # Another Perl edition of a song:
    # The End, by The Beatles
    END {
      $you->take($love) eq $you->made($love) ;
    }

Re: Snarfing data from microsoft word
by grantm (Parson) on Sep 18, 2002 at 10:25 UTC

    Another option would be to Save as HTML (perhaps a VBA macro to convert a batch). You could then use regexes to snarf from the HTML or possibly XPath.

Re: Snarfing data from microsoft word
by zengargoyle (Deacon) on Sep 19, 2002 at 05:28 UTC

    catdoc is cool. Not a perl module, you don't need MSWord. It saves me from evil .doc attachments every day.