in reply to Re^4: WWW::Mechanize::Firefox delayed returns / slow
in thread WWW::Mechanize::Firefox delayed returns / slow

Data::Dumper does by default replace all callbacks with sub { "DUMMY" }. Most likely, these callbacks wait for some results from Firefox.

If you have a sample script and the website is somewhat public, I can try to reproduce the slowness.

Replies are listed 'Best First'.
Re^6: WWW::Mechanize::Firefox delayed returns / slow
by tcordes (Novice) on Dec 03, 2010 at 08:09 UTC

    Ah yes, that makes sense. When I commented out the $callback line nothing would happen in the browser :-) I didn't think Dumper was lying to me.

    Does it still strike you as odd that the first $callback->() hit takes 30-35 secs everytime but 2nd+ calls are immediate?

    Also, I just noticed that while the first call to callback is slow, the first call of the program to the _wait_while_busy while loop takes only 1 sec. It's always the 2nd+ calls that hang. Does that make any sense?

    Also, why have you commented out the if ($need_response) in synchronize()? Since I never care about responses, I'm playing around with commenting out the $response_catcher= assignment to avoid all the voodoo in _install_response_header_listener. As you can see I'm shooting in the dark, but experimenting can't hurt anything.

    The site I am working with is a semi-private intranet. To possibly get you access I'd have to jump through a lot of hoops.

    If you can just throw me little crumbs of help I can do all the grunt work testing/debugging.

    Thanks!

      The $need_response is a (failed) optimization. I always need to store the response, even if it is not requested immediately. Later on, you might ask for $mech->code or other stuff contained only in the response.

      The rest of the behaviour depends on the site in question, so I can't really say what makes it happen without seeing some more, sorry.

        OK, I've made a good small sample program using a public site.

        #!/usr/bin/perl -w $tabregex='deviantART'; $|=1; use Data::Dumper; use WWW::Mechanize::Firefox; $ENV{'SHELL'}='/bin/bash'; $Data::Dumper::Maxdepth=3; $www=WWW::Mechanize::Firefox->new( stack_depth=>5, autodie=>1, timeout +=>60, tab=>qr/$tabregex/, bufsize => 50000000 ); $www->events(['load','onload','loaded','DOMFrameContentLoaded','DOMCon +tentLoaded','error','abort','stop']); print time." get #1\n"; $www->get('http://cmcc.deviantart.com/'); print time." after get #1\n"; print time." content #1\n"; $con=$www->content(); print time." after content #1\n"; print time." get #2\n"; $www->get('http://cmcc.deviantart.com/#/d1a8l1t'); print time." after get #2\n"; print time." content #2\n"; $con=$www->content(); print time." after content #2\n"; print time." saveurl #1\n"; $www->save_url('http://fc02.deviantart.net/fs30/i/2008/048/d/9/Wind_by +_CMcC.jpg'=>'/tmp/Wind_by_CMcC'); print time." after get #2\n";

        With all the added debug prints I put in the module, my output looks like this. Note the time()'s which show insane delays at nearly every step. In fact, this sample program runs simply horribly. It's infinitely worse than my in-progress program which at least mostly works now. Note, I've taken out the wait 20 sec dropout code.

        1291368442 get #1 1291368443 za before $self->_addEventListener($b,$events); 1291368443 zb before $callback->(); 1291368443 zc before $self->_wait_while_busy($load_lock); 1291368443 testing elements, last element 0 ::: before for $element (@ +elements) { 1291368443 testing element, before if ::: before if ((my $s = $element +->{busy} || 0) >= 1) { 1291368443 uc _wait_while_busy sleep 1 ::: before sleep 1; 1291368444 testing elements, last element 0 ::: before for $element (@ +elements) { 1291368444 testing element, before if ::: before if ((my $s = $element +->{busy} || 0) >= 1) { 1291368509 returning from _wait_while_busy ::: before return $element; 1291368509 zd after wait 1291368509 after get #1 1291368509 content #1 1291368546 after content #1 1291368546 get #2 1291368547 za before $self->_addEventListener($b,$events); 1291368547 zb before $callback->(); 1291368547 zc before $self->_wait_while_busy($load_lock); 1291368547 testing elements, last element 0 ::: before for $element (@ +elements) { 1291368547 testing element, before if ::: before if ((my $s = $element +->{busy} || 0) >= 1) { 1291368547 uc _wait_while_busy sleep 1 ::: before sleep 1; 1291368548 testing elements, last element 0 ::: before for $element (@ +elements) { 1291368548 testing element, before if ::: before if ((my $s = $element +->{busy} || 0) >= 1) { Deep recursion on subroutine "MozRepl::RemoteObject::Instance::__attr" + at /usr/lib/perl5/site_perl/5.10.0/MozRepl/RemoteObject.pm line 1342 +, <DATA> line 1. Deep recursion on subroutine "MozRepl::RemoteObject::unjson" at /usr/l +ib/perl5/site_perl/5.10.0/MozRepl/RemoteObject.pm line 1000, <DATA> l +ine 1.

        Got a ton of deep recursion errors and had to ^C it. Note how long the content() call takes, and while running it takes up 100% of one of my cores.

        Is there some code I could plunk in to see what firefox events *are* firing through MozRepl? I'm putting in every name of every event I can find on the net hoping to hit the right one.

        I'm going to try to find a public site I can replicate this problem on with a smaller sample program.

Re^6: WWW::Mechanize::Firefox delayed returns / slow
by tcordes (Novice) on Dec 03, 2010 at 08:33 UTC

    Since all the problems seem to happen on the 2nd+ calls, I tried to replace all mech calls with new/call/undef's. So instead of starting 1 mech and doing lots of get()s I am doing:

    $www=WWW::Mechanize::Firefox->new() $www->events() $www->get() undef $www

    Strangely enough, this change seems to do nothing at all to change the _wait_while_busy hang behaviour! Well, it did one thing, the very first callback call took only 0 sec. Very strange. I'm reverting back to my old code which makes only 1 call to w:m:ff->new().

    I should note that I am doing nothing with these pages I'm loading. I'm not filling in forms, or doing other mech stuff. I'm just get()ing, running some regexes on the content()s and then doing a saveurl() of a related file and on to the next get().

    I also tried adding this to the wait while loop, to no effect: $self->repl->poll;

      Maybe it is the call to ->content that is slow? Consider using ->selector() and/or ->xpath to extract the element and then ->{innerHTML} to get at its contents.

      All your changing around of the ->_wait_while_busy subroutine will only destabilize the whole thing as your script will not wait anymore for Firefox to signal it is ready. Doing this without knowing when and why to do it will only end in tears. It is an internal method and should not be ignored lightly (and if you want to ignore it, there most likely are routines more to the point than this one).

        Thank a lot Corion! For future reference, I will explain how your solution solved my problem (4 years later).

        I am scraping a page which makes ->content() sooo slow. I tried to replace it with

        my $sel = ->selector('body', single => 1); my $content = $sel->{innerHTML};
        and that second line (a simple assignment) was still as slow. I copied $content to a file, which weighs almost 1MB. It doesn't seem that big. Is 1MB too big for an assignment?

        Anyway, I chose a more restrictive selector and now it's blazing fast. If you come across the same problem, this solution might save you too.