in reply to OLE Find with Paragraphs()

It is hard to see exactly what you are trying to do. By far the easiest way to do this is to convert the doc to a text file. The either slurp it in and split it into paras or set $/ ="\n\n" and read it in one para at a time. This sub will do the conversion for you.

my $wdFormatText = 2; # wdFormatText constant from Win32::OLE::Co +nst 'Microsoft Word' sub convert_doc_to_text { my ( $infile, $outfile ) = @_; require Win32::OLE; my $word = Win32::OLE->new( 'Word.Application', sub {$_[0]->Quit;} + ); error("Can't create new instance or Word Reason:$Win32::OLE::LastE +rror")unless $word; $word->{visible} = 0; my $doc = $word->{Documents}->Open($infile); error("Can't open $infile, Reason:$Win32::OLE::LastError")unless $ +doc; # wdFormatDocument wdFormatText wdFormatHTML $doc->SaveAs( { FileName => $outfile, FileFormat => $wdFormatText +} ); $doc->Close; undef $doc; undef $word; } sub error { die shift }

cheers

tachyon

Replies are listed 'Best First'.
Re^2: OLE Find with Paragraphs()
by Discipulus (Canon) on Jun 11, 2004 at 08:33 UTC
    Thanks tachyon!!

    any time you put someting in the monastery i learn something and I cut&paste it somewhere ...

    U are so clean !
    First sub that i made was a recursive one and when I saw your, now in my snippet folder I call it tachyon_recursive, I went stunned.

    now I clean my style with the easiest :
    sub error { die shift }
Re^2: OLE Find with Paragraphs()
by wyldtek (Initiate) on Jun 11, 2004 at 15:20 UTC
    What I am trying to do is verify that certain headings exist and are in the correct order within this Word doc. This is just the first step in this program. Next I will need to copy certain sections from another Word document into this one under some of the headings. I was trying to avoid doing a text conversion because I will need the finished word document to retain all its headings and styles. Is there any way in Perl to search a Word doc and return the line number in the Word doc of the text I was looking for? Any help would be greatly appreciated.

      Word docs don't really have line numbers in the same way a text file does. Here is a search and replace sub that may point you in the correct direction. Another option is RTF format. It is easy to munge with Perl and REs but can retain most general formattion. YMMV.

      sub word_find_and_replace { my ( $word, $rel_file_path, $tokens_ref ) = @_; # first make a temporary file to do the search and replace on my ( $fh, $temp_name ) = get_tempfile( "$DOC_DIR/system", 'doc' ); close $fh; my $content_ref = read_file( "$DOC_DIR/$rel_file_path" ); create_file( "$DOC_DIR/system/$temp_name", $content_ref, 'overwrit +e ok' ); $word->{visible} = 0; my $doc = $word->{Documents}->Open("$DOC_DIR/system/$temp_name"); my $search_obj = $doc->Content->Find; my $replace_obj = $search_obj->Replacement; for my $token ( keys %$tokens_ref ) { my $find = '<?' . $token . '?>'; my $replace = $tokens_ref->{$token}; # now i know this looks wierd but M$ word (at least 2000) want +s \r # as the para marker not \r\n or even \n if you send \n you ge +t little # binary squares..... oh well that's M$ for you. $replace =~ s/\r\n|\n/\r/g; # this makes it work properly. GO +K $search_obj->{Text} = $find; $replace_obj->{Text} = $replace; $search_obj->Execute({Replace => $wdReplaceAll}); } $doc->Save; $doc->Close; # now get the data out of the modified temp file $content_ref = read_file( "$DOC_DIR/system/$temp_name" ); # remove our unwanted temp files and objects unlink "$DOC_DIR/system/$temp_name"; undef $search_obj; undef $replace_obj; undef $doc; return $content_ref; }

      cheers

      tachyon

Re^2: OLE Find with Paragraphs()
by wyldtek (Initiate) on Jun 15, 2004 at 15:28 UTC
    I was able to use the Range object that is returned from Content to get the start and end points of what I found.
    use Win32::OLE; use Win32::OLE::Enum; use Win32::OLE::Variant; use Win32::OLE::Const; use File::Find; use vars qw($MSWord $wd $startdir); # Create new MSWord object and load constants $MSWord=Win32::OLE->new('Word.Application','Quit') or die "Could not load MS Word"; $wd=Win32::OLE::Const->Load($MSWord); unless (open CONFIG, "H:\\perl\\config.txt") { die "Cannot open config file: $!"; } @config_lines = <CONFIG>; my $doc = $MSWord->Documents->Open("H:\\perl\\Functional Design Templa +te r1.6 baseline.doc"); my $content=$doc->Content; my $find=$content->Find; my $lastEnd = 0; foreach $config_line (@config_lines) { my $content=$doc->Content; my $find=$content->Find; $heading = (split(/===/, $config_line))[0]; print $heading."\n"; $find->{Text}= ($heading."^p"); $find->Execute({FindText=>$wd,Forward=>$wd->{True}}); my $output = $find->Found; print $output."\n"; print $content->Start."\n"; print $content->End."\n"; if ($content->Start < $lastEnd) { die $heading.": heading out of order"; } else { $lastEnd = $content->End; } if ($output == 0) { print "Can't find ".$heading."\n"; die "Heading not found"; } } print "DONE"; $doc->Close({SaveChanges=>$wd->{wdSaveChanges}});