http://qs1969.pair.com?node_id=908695

ZJ.Mike.2009 has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to find the occurrence information of a word in a .doc file using Win32::OLE. I know how to do the find the word part but I also want to know the page in which the word appears.

Say the test.doc contains the word 'perl' in page 1. The following code finds the word correctly but not the page number.

Can someone give me some hints? Thanks :)
use strict; use warnings; use Win32::OLE::Const 'Microsoft Word'; my $query = 'perl'; my $file = "E:/test.doc"; my $Word = Win32::OLE->new('Word.Application', 'Quit'); $Word->{'Visible'} = 0; my $doc = $Word->Documents->Open($file) or die "Error opening the docu +ment!\n"; my $paragraphs = $doc->Paragraphs() ; my $enumerate = new Win32::OLE::Enum($paragraphs); while ( defined (my $paragraph = $enumerate->Next()) ) { my $words = Win32::OLE::Enum->new( $paragraph->{Range}->{Words} ); while ( defined ( my $word = $words->Next() ) ) { my $text = $word->{Text}; if ($text =~ /$query/){ #$text->Select; $Doesn't work $paragraph->Select; #wrong page number. my $page_number = $Word->Selection->Information(wdActi +veEndPageNumber); print "Found $text in Page $page_number!\n"; + } } } $Word->ActiveDocument->Close ; $Word->Quit;
==UPDATE==

Problem Solved

I've just found that by replacing the $paragraph->Select line with $word->Select , the code seems to be able to find the correct page number :)

Replies are listed 'Best First'.
Re: Find occurence information on a word in a doc file using Win32::OLE
by Generoso (Prior) on Jun 08, 2011 at 15:32 UTC

    Do you know how to do what your trying to do in word without Perl?
    Recommend you find this first and then do it with win32::ole in Perl

      Maybe this macro will help.

      Sub Test() Dim i As Long Dim myInfo As String Dim myRange As Range Dim rgeSave As Range Set rgeSave = Selection.Range Application.ScreenUpdating = False 'Loop: Do a Search, Then Execute Some Other Commands Inside 'a "Do Until End of Document" Loop (version 2, thanks to Shawn Wilson) +: Selection.Find.ClearFormatting Selection.Find.Replacement.ClearFormatting With Selection.Find .Text = "13" .Forward = True .Wrap = wdFindStop .Format = False .MatchCase = False .MatchWholeWord = False .MatchWildcards = False .MatchSoundsLike = False .MatchAllWordForms = False End With Do While Selection.Find.Execute 'Do something within the found text myInfo = myInfo & Selection.Information(wdFirstCharacterLineNumber) + & " " & Selection.Information(wdActiveEndAdjustedPageNumber) & " " Loop MsgBox myInfo Application.ScreenRefresh Application.ScreenUpdating = True rgeSave.Select End Sub

        @Generoso, thank you for the macro snippet. Really appreciate it :)

        I never used macro until now. I replaced .Text = "13" with .Text = "perl" and ran the modified macro on the test.doc file and I got the desired information. Cool!

        But the problem I posted is only a smaller part of a larger problem. For example, in the real task, it's not just one single word (it's a list of words) and I also need to remove duplicate page numbers (e.g. if 'perl' occurs more than once in one page, it will count as just once).

        But I guess the macro probably can also do that too although the coding might involve a different complexity level compared with a Perl solution. I'll probably need to dig deeper.

        Anyway it's good to know the macro solution. Thank you again