Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

OLE Word...

by jpavel (Sexton)
on Mar 24, 2004 at 14:49 UTC ( [id://339418]=perlquestion: print w/replies, xml ) Need Help??

jpavel has asked for the wisdom of the Perl Monks concerning the following question:

Don't know if anybody would happen to have a snippet of code lying about that would demonstrate how to nab the current pagenumber from a Word Document? Not as easy as I thought... I can pull words, formatting, structure, word counts, etc., etc.... but I can't get the darn pagenumber!

The project is a web page used to search a Word document, match the song (the doc contains music sheets), and publish the document in .pdf format... the code below searches and parses correctly, but my thought was to publish to .pdf based on page numbers... any alternative thoughts would be welcome as well!

... I moved the code to a response so this isn't such a long post...

Replies are listed 'Best First'.
Re: OLE Word...
by jpavel (Sexton) on Mar 24, 2004 at 15:07 UTC
    #!/Perl/bin/perl use CGI qw(:standard); use CGI::Carp qw(warningsToBrowser fatalsToBrowser); use Win32::OLE qw(in with); use Win32::OLE::Const 'Microsoft Word'; use Win32::OLE::Enum; use strict; print header; print start_html("Processing..."); print h3("Processing your request..."); my $text = param("search_text"); my $type = param("type"); print "You requested a search for the words \"$text\" in the documents + where they appear as $type...\n"; my $time = localtime; my $Word = Win32::OLE->GetActiveObject('Word.Application') || Win32::O +LE->new('Word.Application', 'Quit'); my $infile = "<Word document here>"; $Word->Documents->Open($infile,{ReadOnly => 1}) || die("Unable to open + document ", Win32::OLE->LastError()); $Word->{visible} = 0; $Word->{DisplayAlerts} = 0; my $range = $Word->ActiveDocument->Content; my $current_song; my $flag; my @mytext; my @mystyle; my $compare; my % +song; my $song_count = 0; foreach my $word (in $range->Words){ push(@mytext,$word->{Text}); push(@mystyle,$word->{Style}->{NameLocal}); } for (my $x = 0;$x <= $#mytext; $x++) { my $current_song; my $current_writer; my $current_lyrics; if ($mystyle[$x] eq "Heading 1") { my $counter = 0; undef $current_song; while ($mystyle[$x+$counter] eq "Heading 1") { $current_song .= $mytext[$x+$counter]; $counter++; } $x += $counter; chop($current_song); chop($current_song); $song{Title}[$song_count] = $current_song; } elsif ($mystyle[$x] =~ /Songwriter/) { my $counter = 0; undef $current_writer; while ($mystyle[$x+$counter] =~ /Songwriter/) { $current_writer .= $mytext[$x+$counter]; $counter++; } $x += $counter; chop($current_writer); chop($current_writer); $song{Songwriter}[$song_count] = $current_writer; } else { my $counter = 0; undef $current_lyrics; while ($mystyle[$x+$counter] =~ /Lyrics|Chord/) { if ($mystyle[$x+$counter] eq "Lyrics") { $current_lyrics .= $mytext[$x+$counter]; } $counter++; } $song{Lyrics}[$song_count] = $current_lyrics; $song_count++; $x += $counter; } } print "<br><br>Matches:<ol>"; my $matching_song; for (my $x=0;$x<$song_count;$x++) { if (($type eq "Heading 1" and $song{Title}[$x] =~ /$text/i) or ($type eq "Lyrics" and $song{Lyrics}[$x] =~ /$text/i) or ($type eq "Songwriter" and $song{Songwriter}[$x] =~ /$text/i)) + { print "<li><b>$song{Title}[$x]:</b> <i>$song{Songwriter}[$x]< +/i>\n"; print "<ul><li><pre>$song{Lyrics}[$x]</pre>\n"; print "<li><a href=\"gen_song.pl?title=$song{Title}[$x]&format +=doc\">$song{Title}[$x].doc</a>\n"; print "<li><a href=\"gen_song.pl?title=$song{Title}[$x]&format +=pdf\">$song{Title}[$x].pdf</a></ul>\n"; $matching_song = $x; } } print "</ol>"; print "<br><br><br><i><font color=blue>Query started at $time"; $time = localtime; print "<br>Query completed at $time"; print end_html; $Word->Quit(); undef $Word;
Re: OLE Word...
by Zero_Flop (Pilgrim) on Mar 25, 2004 at 06:57 UTC
    I would suggest you do what guha mentions. It is preferable to use the functions offered within the application to do the dirty worked, it will simplify your code and the functions will probably be optimized for the applications.

    Anyway if you still want to know what the page number is use:

    Either
    Selection.Information (wdActiveEndPageNumber)
    or
    Selection.Information (wdActiveEndAdjustedPageNumber)

    This was found on a quick google. You may want to do your own search because MS had some problems with these and they have suggestions for word 2000
Re: OLE Word... (or PDF?)
by bart (Canon) on Mar 24, 2004 at 18:48 UTC
    Why are you storing the data as a Word file? Why not convert the document to PDF already, search that, and if you find a match, extract a page into a new PDF? I'm sure that would be faster.

    Or just make every song into a separate PDF file. Store the text and index pointer into a database — maybe a flatfile.

      In an ideal world, I'd love that. The creator of this solution sent out a Word template, so all the users are submitting docs, which will be continually editied/updated from the repository. The web front ideally needs to pull from these files to make sure it's the latest version. Not an ideal solution, but something I hope to be able to work with.
Re: OLE Word...
by guha (Priest) on Mar 24, 2004 at 21:58 UTC

    Well, I guess you are familar with the OLE-browser. My advice is to check the Information method of a Selection or Range object. Specifically the wdActiveEndAdjustedPageNumber property.

    Moreover I suspect that you will find that the Find method of a Range or Selection object can do what you want in terms of searching

    Heck it advertises the ability to do MatchCase, MatchWholeWord, MatchWildcards, MatchSoundsLike, MatchAllWordForms. Doesn't that sound tasty?

    HTsomewhatH

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://339418]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (4)
As of 2024-03-29 09:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found