in reply to Re^5: WWW::Mechanize::Firefox delayed returns / slow
in thread WWW::Mechanize::Firefox delayed returns / slow

Since all the problems seem to happen on the 2nd+ calls, I tried to replace all mech calls with new/call/undef's. So instead of starting 1 mech and doing lots of get()s I am doing:

$www=WWW::Mechanize::Firefox->new() $www->events() $www->get() undef $www

Strangely enough, this change seems to do nothing at all to change the _wait_while_busy hang behaviour! Well, it did one thing, the very first callback call took only 0 sec. Very strange. I'm reverting back to my old code which makes only 1 call to w:m:ff->new().

I should note that I am doing nothing with these pages I'm loading. I'm not filling in forms, or doing other mech stuff. I'm just get()ing, running some regexes on the content()s and then doing a saveurl() of a related file and on to the next get().

I also tried adding this to the wait while loop, to no effect: $self->repl->poll;

Replies are listed 'Best First'.
Re^7: WWW::Mechanize::Firefox delayed returns / slow
by Corion (Patriarch) on Dec 03, 2010 at 12:47 UTC

    Maybe it is the call to ->content that is slow? Consider using ->selector() and/or ->xpath to extract the element and then ->{innerHTML} to get at its contents.

    All your changing around of the ->_wait_while_busy subroutine will only destabilize the whole thing as your script will not wait anymore for Firefox to signal it is ready. Doing this without knowing when and why to do it will only end in tears. It is an internal method and should not be ignored lightly (and if you want to ignore it, there most likely are routines more to the point than this one).

      Thank a lot Corion! For future reference, I will explain how your solution solved my problem (4 years later).

      I am scraping a page which makes ->content() sooo slow. I tried to replace it with

      my $sel = ->selector('body', single => 1); my $content = $sel->{innerHTML};
      and that second line (a simple assignment) was still as slow. I copied $content to a file, which weighs almost 1MB. It doesn't seem that big. Is 1MB too big for an assignment?

      Anyway, I chose a more restrictive selector and now it's blazing fast. If you come across the same problem, this solution might save you too.

        Thank you very much for sharing how you made your script faster!

        That "simple assignment" is not so simple behind the scenes. It involves converting the DOM in the Firefox process to a text string and then copying that text string to the Perl process. The network connection between Firefox and Perl is not as fast as it could be, unfortunately.