in reply to Re^5: WWW::Mechanize::Firefox delayed returns / slow
in thread WWW::Mechanize::Firefox delayed returns / slow

Ah yes, that makes sense. When I commented out the $callback line nothing would happen in the browser :-) I didn't think Dumper was lying to me.

Does it still strike you as odd that the first $callback->() hit takes 30-35 secs everytime but 2nd+ calls are immediate?

Also, I just noticed that while the first call to callback is slow, the first call of the program to the _wait_while_busy while loop takes only 1 sec. It's always the 2nd+ calls that hang. Does that make any sense?

Also, why have you commented out the if ($need_response) in synchronize()? Since I never care about responses, I'm playing around with commenting out the $response_catcher= assignment to avoid all the voodoo in _install_response_header_listener. As you can see I'm shooting in the dark, but experimenting can't hurt anything.

The site I am working with is a semi-private intranet. To possibly get you access I'd have to jump through a lot of hoops.

If you can just throw me little crumbs of help I can do all the grunt work testing/debugging.

Thanks!

  • Comment on Re^6: WWW::Mechanize::Firefox delayed returns / slow

Replies are listed 'Best First'.
Re^7: WWW::Mechanize::Firefox delayed returns / slow
by Corion (Patriarch) on Dec 03, 2010 at 08:22 UTC

    The $need_response is a (failed) optimization. I always need to store the response, even if it is not requested immediately. Later on, you might ask for $mech->code or other stuff contained only in the response.

    The rest of the behaviour depends on the site in question, so I can't really say what makes it happen without seeing some more, sorry.

      OK, I've made a good small sample program using a public site.

      #!/usr/bin/perl -w $tabregex='deviantART'; $|=1; use Data::Dumper; use WWW::Mechanize::Firefox; $ENV{'SHELL'}='/bin/bash'; $Data::Dumper::Maxdepth=3; $www=WWW::Mechanize::Firefox->new( stack_depth=>5, autodie=>1, timeout +=>60, tab=>qr/$tabregex/, bufsize => 50000000 ); $www->events(['load','onload','loaded','DOMFrameContentLoaded','DOMCon +tentLoaded','error','abort','stop']); print time." get #1\n"; $www->get('http://cmcc.deviantart.com/'); print time." after get #1\n"; print time." content #1\n"; $con=$www->content(); print time." after content #1\n"; print time." get #2\n"; $www->get('http://cmcc.deviantart.com/#/d1a8l1t'); print time." after get #2\n"; print time." content #2\n"; $con=$www->content(); print time." after content #2\n"; print time." saveurl #1\n"; $www->save_url('http://fc02.deviantart.net/fs30/i/2008/048/d/9/Wind_by +_CMcC.jpg'=>'/tmp/Wind_by_CMcC'); print time." after get #2\n";

      With all the added debug prints I put in the module, my output looks like this. Note the time()'s which show insane delays at nearly every step. In fact, this sample program runs simply horribly. It's infinitely worse than my in-progress program which at least mostly works now. Note, I've taken out the wait 20 sec dropout code.

      1291368442 get #1 1291368443 za before $self->_addEventListener($b,$events); 1291368443 zb before $callback->(); 1291368443 zc before $self->_wait_while_busy($load_lock); 1291368443 testing elements, last element 0 ::: before for $element (@ +elements) { 1291368443 testing element, before if ::: before if ((my $s = $element +->{busy} || 0) >= 1) { 1291368443 uc _wait_while_busy sleep 1 ::: before sleep 1; 1291368444 testing elements, last element 0 ::: before for $element (@ +elements) { 1291368444 testing element, before if ::: before if ((my $s = $element +->{busy} || 0) >= 1) { 1291368509 returning from _wait_while_busy ::: before return $element; 1291368509 zd after wait 1291368509 after get #1 1291368509 content #1 1291368546 after content #1 1291368546 get #2 1291368547 za before $self->_addEventListener($b,$events); 1291368547 zb before $callback->(); 1291368547 zc before $self->_wait_while_busy($load_lock); 1291368547 testing elements, last element 0 ::: before for $element (@ +elements) { 1291368547 testing element, before if ::: before if ((my $s = $element +->{busy} || 0) >= 1) { 1291368547 uc _wait_while_busy sleep 1 ::: before sleep 1; 1291368548 testing elements, last element 0 ::: before for $element (@ +elements) { 1291368548 testing element, before if ::: before if ((my $s = $element +->{busy} || 0) >= 1) { Deep recursion on subroutine "MozRepl::RemoteObject::Instance::__attr" + at /usr/lib/perl5/site_perl/5.10.0/MozRepl/RemoteObject.pm line 1342 +, <DATA> line 1. Deep recursion on subroutine "MozRepl::RemoteObject::unjson" at /usr/l +ib/perl5/site_perl/5.10.0/MozRepl/RemoteObject.pm line 1000, <DATA> l +ine 1.

      Got a ton of deep recursion errors and had to ^C it. Note how long the content() call takes, and while running it takes up 100% of one of my cores.

        I used the following, slightly changed, script, but it works very well for me. I've removed the superfluous event setting and parameters that WWW::Mechanize::Firefox doesn't support:

        #!/usr/bin/perl -w use strict; use WWW::Mechanize::Firefox; my $www=WWW::Mechanize::Firefox->new( #stack_depth=>5, autodie=>1, timeout=>60, #tab=>qr/$tabregex/, bufsize => 50_000_000, ); #$www->events(['load','onload','loaded','DOMFrameContentLoaded','DOMCo +ntentLoaded','error','abort','stop']); print time." get #1\n"; $www->get('http://cmcc.deviantart.com/'); print time." after get #1\n"; print time." content #1\n"; my $con=$www->content(); print time." after content #1\n"; print time." get #2\n"; $www->get('http://cmcc.deviantart.com/#/d1a8l1t'); print time." after get #2\n"; print time." content #2\n"; $con=$www->content(); print time." after content #2\n";

        Note that I had to allow Javascript in the Noscript plugin for deviantart.net, as the second URL uses Javascript to display a single image. Other than that, I get the following (relatively quick) output:

        1291386396 get #1 1291386398 after get #1 1291386398 content #1 1291386398 after content #1 1291386398 get #2 1291386399 after get #2 1291386399 content #2 1291386399 after content #2

        If you are feeling adventurous, you can use the following, changed Javascript to make Firefox display an alert box whenever it captures an event:

        sub _addEventListener { my ($self,$browser,$events) = @_; $events ||= $self->events; $events = [$events] unless ref $events; # This registers multiple events for a one-shot event my $make_semaphore = $self->repl->declare(<<'JS'); function(browser,events) { var lock = {}; lock.busy = 0; var b = browser; var listeners = []; for( var i = 0; i < events.length; i++) { var evname = events[i]; var callback = (function(listeners,evname){ return function(e) { if (! lock.busy) { lock.busy++; lock.event = evname; lock.js_event = {}; lock.js_event.target = e.originalTarget; lock.js_event.type = e.type; alert("Caught first event " + e.type + " " + e.mes +sage); } else { alert("Caught duplicate event " + e.type + " " + e +.message); }; for( var j = 0; j < listeners.length; j++) { b.removeEventListener(listeners[j][0],listeners[j] +[1],true); }; }; })(listeners,evname); listeners.push([evname,callback]); b.addEventListener(evname,callback,true); }; return lock } JS return $make_semaphore->($browser,$events); };

      Is there some code I could plunk in to see what firefox events *are* firing through MozRepl? I'm putting in every name of every event I can find on the net hoping to hit the right one.

      I'm going to try to find a public site I can replicate this problem on with a smaller sample program.

        Sorry, but I'm not aware of any "catch-all" way to see a list of all events fired by Firefox (respectively a Firefox window or browser object).