in reply to Re^2: Extracting information from a MS WORD Document
in thread Extracting information from a MS WORD Document

Here is some code that *might* work for you. I say *might* because I do not have win32 perl installed where I am, and my download is stuck at 1% ;) So take this for what it's worth; this is a starting point for your code. This is cribbed together from some win32 code I have in my home directory.
use Win32::OLE; use Win32::OLE::Enum; # this $text will hold all the text of all the word docs my $text=""; # this is how we start Word from Perl my $word=Win32::OLE->new("Word.Application"); # for every file in the current directory foreach my $filename (<*.doc>) { # tell Word to open the file we just found my $doc=$word->Documents->Open($filename); # get an object representing all the paragraphs in the doc my $paragraphs=new Win32::OLE::Enum($doc->Paragraphs()); # for every paragraph... while(defined($paragraph = $paragraphs->Next())) { # append the paragraph text to $text $text.=$paragraph->{Range}->{Text}; } } foreach my $line (split(/\r\n/,$text)) { # this bit DEFINITELY doesn't work! if (/some pattern/) { ..do something.. } }

Replies are listed 'Best First'.
A reply falls below the community's threshold of quality. You may see it by logging in.