You're actually waiting for MSIE to finish building the page. For Google, I see no problem, but for a page that depends partly on Javascript to complete the page (using document.write(), for example), you have to wait.

What I've done until now, is load the page twice, and then wait a second. Not great, but it worked rather well. But you just gave me a new hint.

So I tried printing out $ie->{agent}->ReadyState in a loop, with just a little sleep after using

use Time::HiRes 'sleep';

It turns out that on a page depending on Javascript, for a little while, ReadyState returns 3, and then it jumps to 4. That would seem like a pretty reliable way to get to see if the page is actually finished.

Checking the source for _wait_while_busy() in Win32::IE::Mechanize (0.008), I spotted the comment:

# The documentation isn't clear on this. # The DocumentComplete event roughly says: # the event gets fired (for each frame) after ReadyState == 4

That points in the same ditrection. Perhaps access to ReadyState should be more formalized, but for now, the next snippet seems to work well for me:

my $url = '...'; # you choose $ie->get($url); use Time::HiRes 'sleep'; while($ie->{agent}->ReadyState < 4) { sleep 0.055; } $\ = "\n"; print $_->url foreach $ie->links;

Note that I picked 55ms for the sleep time, because that appears to roughly be the resolution of the timer in Windows. It also looks like a good compromise to me, not too fast, nor too slow.


In reply to Re: Win32::IE::Mechanize completed? by bart
in thread Win32::IE::Mechanize completed? by 2ge

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.