in reply to Re: WWW::Mechanize::Firefox delayed returns / slow
in thread WWW::Mechanize::Firefox delayed returns / slow

Success! I debugged and hacked some of the pm code and got the mech to go without hangs or overly long pauses.

What I did is just a kludge, but it could hint at the real solution.

I found that the get() was calling synchronize(). Synchronize would be slow (5-20 secs) on the $callback->(). The big problem though is after get()ing a couple of pages the subsequent _wait_while_busy($load_lock) hangs forever.

So in _wait_while_busy I added before the while loop: my $start=time; and added at the end of the for loop: return $element if time-$start>5;

Now my program seems to work as expected! Hooray!

Obviously we're waiting on events that never happen. Again, some global timeout might help. Or, a rejig of this code if this is a true bug (and not just a quirk of the site I'm crawling).

I can continue to help as required. Thanks!

  • Comment on Re^2: WWW::Mechanize::Firefox delayed returns / slow

Replies are listed 'Best First'.
Re^3: WWW::Mechanize::Firefox delayed returns / slow
by Corion (Patriarch) on Dec 02, 2010 at 08:39 UTC

    ->synchronize waits on the appropriate event to fire in Firefox. If it takes longer than you expect, most likely you're waiting for the wrong event. See the ->events method and the events => argument to the constructor on how to define your own events.

    Just adding a timeout in ->_wait_while_busy only papers over the fact that you're not receiving (or rather, listening to) the right event.

    Now, finding the right event to listen to requires some ingenuity, as the default events (DOMFrameContentLoaded, DOMContentLoaded, error, abort, stop) don't seem to fire "soon enough" in your case. Maybe load is another good event to listen to, but it fires before subframes have loaded.

      My program is very simple. I am not doing any event stuff myself. I'm just doing extremely simple get() and saveurl(). However, I see what you are saying about events now that I've been playing with the module code for a few days.

      I'll see if I can figure out how to add the load event. I'm pretty sure my site has no subframes.

      I've taken to adding Data::Dumper calls to many places in the module to see what's going on. If any of those would help you, let me know. Most of the time the 100 pages of perl code that is Dumped for one variable leaves me dazed & confused. I've taken to restricting it to 2-3 levels deep.

      Thanks, I will try more things and report back!

      I've added a lot of debug code to the module to try various things. Maybe you can help make sense of the results?

      First off, a very strange thing: I put a warn before and after $callback->() in synchronize(). On the very first hit to that code in a run it always takes 30+ secs to execute a dummy callback!!? The output is below. The long number is time(). The $VAR output is Dumper($callback).

      1291361165 zb before cb $VAR1 = sub { "DUMMY" }; 1291361200 zc after cb, before wait

      After that first hit all subsequent $callback->() (which are also all DUMMY) take 0 secs! This doesn't make any sense to me at all. How can a dummy call take 30 secs?

      I've tried adding load to the events (globally). Right before the $load_lock=... in synchronize() I added a Dumper($events) to confirm it's there:

      $VAR1 = [ 'load', 'DOMFrameContentLoaded', 'DOMContentLoaded', 'error', 'abort', 'stop' ];

      I also tried it with just load (no other events). In all cases _wait_while_busy loops until my 20sec timer stops it, so no change there. Any other events I can wait on?

      I also Dumper'd the $element in the for $element loop of _wait_while_busy, but that returns some massive structures that don't mean much to me (you're losing me with all the javascript code). However, if tidbits could help debug, I can post here as required.

        Data::Dumper does by default replace all callbacks with sub { "DUMMY" }. Most likely, these callbacks wait for some results from Firefox.

        Data::Dumper does by default replace all callbacks with sub { "DUMMY" }. Most likely, these callbacks wait for some results from Firefox.

        If you have a sample script and the website is somewhat public, I can try to reproduce the slowness.