in reply to Re^7: WWW::Mechanize::Firefox delayed returns / slow
in thread WWW::Mechanize::Firefox delayed returns / slow

OK, I've made a good small sample program using a public site.

#!/usr/bin/perl -w $tabregex='deviantART'; $|=1; use Data::Dumper; use WWW::Mechanize::Firefox; $ENV{'SHELL'}='/bin/bash'; $Data::Dumper::Maxdepth=3; $www=WWW::Mechanize::Firefox->new( stack_depth=>5, autodie=>1, timeout +=>60, tab=>qr/$tabregex/, bufsize => 50000000 ); $www->events(['load','onload','loaded','DOMFrameContentLoaded','DOMCon +tentLoaded','error','abort','stop']); print time." get #1\n"; $www->get('http://cmcc.deviantart.com/'); print time." after get #1\n"; print time." content #1\n"; $con=$www->content(); print time." after content #1\n"; print time." get #2\n"; $www->get('http://cmcc.deviantart.com/#/d1a8l1t'); print time." after get #2\n"; print time." content #2\n"; $con=$www->content(); print time." after content #2\n"; print time." saveurl #1\n"; $www->save_url('http://fc02.deviantart.net/fs30/i/2008/048/d/9/Wind_by +_CMcC.jpg'=>'/tmp/Wind_by_CMcC'); print time." after get #2\n";

With all the added debug prints I put in the module, my output looks like this. Note the time()'s which show insane delays at nearly every step. In fact, this sample program runs simply horribly. It's infinitely worse than my in-progress program which at least mostly works now. Note, I've taken out the wait 20 sec dropout code.

1291368442 get #1 1291368443 za before $self->_addEventListener($b,$events); 1291368443 zb before $callback->(); 1291368443 zc before $self->_wait_while_busy($load_lock); 1291368443 testing elements, last element 0 ::: before for $element (@ +elements) { 1291368443 testing element, before if ::: before if ((my $s = $element +->{busy} || 0) >= 1) { 1291368443 uc _wait_while_busy sleep 1 ::: before sleep 1; 1291368444 testing elements, last element 0 ::: before for $element (@ +elements) { 1291368444 testing element, before if ::: before if ((my $s = $element +->{busy} || 0) >= 1) { 1291368509 returning from _wait_while_busy ::: before return $element; 1291368509 zd after wait 1291368509 after get #1 1291368509 content #1 1291368546 after content #1 1291368546 get #2 1291368547 za before $self->_addEventListener($b,$events); 1291368547 zb before $callback->(); 1291368547 zc before $self->_wait_while_busy($load_lock); 1291368547 testing elements, last element 0 ::: before for $element (@ +elements) { 1291368547 testing element, before if ::: before if ((my $s = $element +->{busy} || 0) >= 1) { 1291368547 uc _wait_while_busy sleep 1 ::: before sleep 1; 1291368548 testing elements, last element 0 ::: before for $element (@ +elements) { 1291368548 testing element, before if ::: before if ((my $s = $element +->{busy} || 0) >= 1) { Deep recursion on subroutine "MozRepl::RemoteObject::Instance::__attr" + at /usr/lib/perl5/site_perl/5.10.0/MozRepl/RemoteObject.pm line 1342 +, <DATA> line 1. Deep recursion on subroutine "MozRepl::RemoteObject::unjson" at /usr/l +ib/perl5/site_perl/5.10.0/MozRepl/RemoteObject.pm line 1000, <DATA> l +ine 1.

Got a ton of deep recursion errors and had to ^C it. Note how long the content() call takes, and while running it takes up 100% of one of my cores.

Replies are listed 'Best First'.
Re^9: WWW::Mechanize::Firefox delayed returns / slow
by Corion (Patriarch) on Dec 03, 2010 at 14:36 UTC

    I used the following, slightly changed, script, but it works very well for me. I've removed the superfluous event setting and parameters that WWW::Mechanize::Firefox doesn't support:

    #!/usr/bin/perl -w use strict; use WWW::Mechanize::Firefox; my $www=WWW::Mechanize::Firefox->new( #stack_depth=>5, autodie=>1, timeout=>60, #tab=>qr/$tabregex/, bufsize => 50_000_000, ); #$www->events(['load','onload','loaded','DOMFrameContentLoaded','DOMCo +ntentLoaded','error','abort','stop']); print time." get #1\n"; $www->get('http://cmcc.deviantart.com/'); print time." after get #1\n"; print time." content #1\n"; my $con=$www->content(); print time." after content #1\n"; print time." get #2\n"; $www->get('http://cmcc.deviantart.com/#/d1a8l1t'); print time." after get #2\n"; print time." content #2\n"; $con=$www->content(); print time." after content #2\n";

    Note that I had to allow Javascript in the Noscript plugin for deviantart.net, as the second URL uses Javascript to display a single image. Other than that, I get the following (relatively quick) output:

    1291386396 get #1 1291386398 after get #1 1291386398 content #1 1291386398 after content #1 1291386398 get #2 1291386399 after get #2 1291386399 content #2 1291386399 after content #2

    If you are feeling adventurous, you can use the following, changed Javascript to make Firefox display an alert box whenever it captures an event:

    sub _addEventListener { my ($self,$browser,$events) = @_; $events ||= $self->events; $events = [$events] unless ref $events; # This registers multiple events for a one-shot event my $make_semaphore = $self->repl->declare(<<'JS'); function(browser,events) { var lock = {}; lock.busy = 0; var b = browser; var listeners = []; for( var i = 0; i < events.length; i++) { var evname = events[i]; var callback = (function(listeners,evname){ return function(e) { if (! lock.busy) { lock.busy++; lock.event = evname; lock.js_event = {}; lock.js_event.target = e.originalTarget; lock.js_event.type = e.type; alert("Caught first event " + e.type + " " + e.mes +sage); } else { alert("Caught duplicate event " + e.type + " " + e +.message); }; for( var j = 0; j < listeners.length; j++) { b.removeEventListener(listeners[j][0],listeners[j] +[1],true); }; }; })(listeners,evname); listeners.push([evname,callback]); b.addEventListener(evname,callback,true); }; return lock } JS return $make_semaphore->($browser,$events); };