Howdy Monks. I am having a strange problem with Win32::OLE and Word. Running this simple script:
use strict; use Win32::OLE; use Win32::OLE::Const 'Microsoft Word'; my $filename = 'C:\temp\test.doc'; my $word = Win32::OLE->new('Word.Application', 'Quit'); my $doc = $word->Documents->Open($filename) || die("Unable to open doc +ument ", Win32::OLE->LastError()); my $nwords = $doc->Words->Count; my @wordtext; my @wordcolor; my $starttime = time; for(my $i = 1; $i <= $nwords; $i++) { $wordcolor[$i] = $doc->Words->Item($i)->HighlightColorIndex; $wordtext[$i] = $doc->Words->Item($i)->Text; } my $elapsed = time - $starttime; printf "\n\n%6.0D words\n%6.0D sec\n%5.3f per word", $nwords, $elapsed +, $elapsed/$nwords; $doc->Close; undef $word;
on a 30 page paper results in a processing rate of 0.158 sec/word, or about 379 words per minute on a P4 3.4 ghz with many gigs of memory. Now I know OLE is a dog, but that's about human reading speed! Clearly something is wrong.

The stranger thing is that I did some experimenting, and the words-per-second rate is a linear function of the number of words in the document. In other words, the reading rate is twice as fast for a document half the size, twice as slow for a document twice the size, and so on. This one really baffles me bc I can't imagine why the size of the document would influence the reading speed unless it has to count from the beginning every time you request a word, and surely they didn't design the method like that.

So I checked the archives and found this thread about Word slowness and tried some of the suggested fixes. Disbling the Word macro checker and diabling the DEP helped maybe a little but didn't solve the problem.

Anybody know how I can speed this thing up? Can I get the text and color info in larger chunks, maybe?

Many thanks....

Steve


In reply to Win32 Word behaving strangely by cormanaz

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.