2ge has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I am playing with this module (I need it, it is good), but I have one question about loading complete page, my code snippet:
use strict; use warnings; use Win32::IE::Mechanize; my $ie = Win32::IE::Mechanize->new( visible => 1, left => 0, top => 0, height => 950, width => 1280, ); $ie->get('http://www.google.com'); #example! sleep 5; $, = "\n"; print my @links = $ie->links;

This works ok, but I have quistion about "sleep 5;". I didn't found any method, which return status of page. I don't want use sleep 5.
in doc I found:
$ie->success Return true for ReadyState >= 2; in source is sub success { $_[0]->{agent}->ReadyState >= 2 }
so I played with these values, also put $ie->get in loop, but nothing helps. So there is no workaround and I have to use sleep ?

Replies are listed 'Best First'.
Re: Win32::IE::Mechanize completed?
by bart (Canon) on Apr 12, 2005 at 08:52 UTC
    You're actually waiting for MSIE to finish building the page. For Google, I see no problem, but for a page that depends partly on Javascript to complete the page (using document.write(), for example), you have to wait.

    What I've done until now, is load the page twice, and then wait a second. Not great, but it worked rather well. But you just gave me a new hint.

    So I tried printing out $ie->{agent}->ReadyState in a loop, with just a little sleep after using

    use Time::HiRes 'sleep';

    It turns out that on a page depending on Javascript, for a little while, ReadyState returns 3, and then it jumps to 4. That would seem like a pretty reliable way to get to see if the page is actually finished.

    Checking the source for _wait_while_busy() in Win32::IE::Mechanize (0.008), I spotted the comment:

    # The documentation isn't clear on this. # The DocumentComplete event roughly says: # the event gets fired (for each frame) after ReadyState == 4

    That points in the same ditrection. Perhaps access to ReadyState should be more formalized, but for now, the next snippet seems to work well for me:

    my $url = '...'; # you choose $ie->get($url); use Time::HiRes 'sleep'; while($ie->{agent}->ReadyState < 4) { sleep 0.055; } $\ = "\n"; print $_->url foreach $ie->links;

    Note that I picked 55ms for the sleep time, because that appears to roughly be the resolution of the timer in Windows. It also looks like a good compromise to me, not too fast, nor too slow.

      Hello bart,

      whanks for nice reply. I tried this before, and I was playing with ReadyState, but for me sometimes jumps to 4, and sometimes is still 3 (even if I have in IE status done). It works at you on any page ? try huge pages, for example http://www.albinoblacksheep.com/; sites where is flash it doesn't work. Also, when I use some openproxy it many times gets me state=3, also on "easy" pages (xhtml+js). So I can't use this to determine. But it is better than nothing, now we can specify timeout, and after timeout we can extract links, if no links found, reload :). It is always better to define static sleep time. Ok, thanks.