in reply to WWW::Mechanize::Firefox delayed returns / slow

The failures really look like as if Firefox fails to load local files. This is very weird.

The Perl program is likely waiting for Firefox/Javascript to send the event that the page has loaded completely. Why this is slow for you, I don't know. What version of Firefox are you running?

I'm not sure what is causing the delay for you. I think that maybe in upcoming versions, I'll add more debugging features that output more diagnostics.

  • Comment on Re: WWW::Mechanize::Firefox delayed returns / slow

Replies are listed 'Best First'.
Re^2: WWW::Mechanize::Firefox delayed returns / slow
by tcordes (Novice) on Dec 02, 2010 at 06:56 UTC

    I solved the local file problem. It was indeed just a matter of the tests running as root in cpan cmd line while my firefox was running as non-root. I gave my non-root user access to the files and now the tests are running much better.

    Here are the only failures:

    t/50-app-new.t .................... 6/8 # Failed test 'The name is "Standard"' # at t/50-app-new.t line 46. # got: 'Default' # expected: 'Standard' # Looks like you failed 1 test of 8. t/50-app-new.t .................... Dubious, test returned 1 (wstat 25 +6, 0x100) Failed 1/8 subtests t/50-mech-bufsize.t ............... 4/8

    It hangs on the last one at 4/8. I waited 15 mins, but it's hung. The hang seems very similar to the hangs my program is having. I added some debugs to that test and tried again. It's hanging forever on the mech->content_as_png call. Commenting out that call I can get the tests to continue.

    t/70-urlbar.t ..................... 1/2 # Failed test 'We changed the location once' # at t/70-urlbar.t line 48. # got: '0' # expected: '1' Use of uninitialized value $this in pattern match (m//) at t/70-urlbar +.t line 49, <DATA> line 1. # Failed test '... to that other page' # at t/70-urlbar.t line 49. # undef # doesn't match '(?-xism:/70-urlbar-2.html$)' # Looks like you failed 2 tests of 2. t/70-urlbar.t ..................... Dubious, test returned 2 (wstat 51 +2, 0x200) Failed 2/2 subtests

    Also, I get a lot (dozen+) of these warnings:

    Subroutine MozRepl::__load_plugins redefined at /usr/lib/perl5/site_pe +rl/5.10.0/Module/Pluggable/Fast.pm line 104, <DATA> line 1.

    And I had to turn off my proxy settings ("direct connection") to make a few tests pass (my squid seems to return things other than what the tests expect).

      t/50-app-new.t .................... 6/8 # Failed test 'The name is "Standard"' # at t/50-app-new.t line 46. # got: 'Default' # expected: 'Standard' # Looks like you failed 1 test of 8.

      Ouch - the name of the "Default" theme in Firefox is localized. I have to take care of that in my tests, likely by not testing for the name but the UUID of the theme.

      t/50-mech-bufsize.t ............... 4/8

      If that test hangs, this is bad for large webpages. It loads an image-heavy deviantart.com page, so that should work in any case. Maybe this is an issue with your proxy setup, but without knowing more about it, that'll be hard to replicate.

      It would be nice if you could tell me what tests fail (and how) with your proxy, as that would help me make the tests (and maybe the module) more resilient against the proxy results.

      t/70-urlbar.t ..................... 1/2 # Failed test 'We changed the location once' # at t/70-urlbar.t line 48. # got: '0' # expected: '1' Use of uninitialized value $this in pattern match (m//) at t/70-urlbar +.t line 49, <DATA> line 1. # Failed test '... to that other page' # at t/70-urlbar.t line 49. # undef # doesn't match '(?-xism:/70-urlbar-2.html$)' # Looks like you failed 2 tests of 2. t/70-urlbar.t ..................... Dubious, test returned 2 (wstat 51 +2, 0x200) Failed 2/2 subtests

      These test failures are weird, as they also seem to indicate that the script does not receive some Firefox events. I'm not sure what to make of that.

        t/50-mech-bufsize.t 4/8 hangs regardless of proxy being on/off. The pages my program access are also huge pages with lots of images. Maybe there's a connection.

        In 50-mech-bufsize.t I noticed you have bufsize=>10_000_000. Are those underscores allowed/valid? For fun I took them out, and I also tried increasing the bufsize to 100M. Didn't seem to change any results either in t tests or my program.

        As requested, here's the failing tests with proxy on. For all future testing I am leaving my proxy off to eliminate that as a possible problem.

        t/50-mech-status.t ................ 2/5 # Failed test 'GETting http://www.doesnotexist.example/ gives a 500 +(no proxy) or 404 (proxy)' # at t/50-mech-status.t line 30. # '503' # doesn't match '(?-xism:^(404|500)$)' # <html><head> <meta http-equiv="Content-Type" content="text/html; ch +arset=utf-8"> <title>ERROR: The requested URL could not be retrieved< +/title> <style type="text/css"><!-- /* # Stylesheet for Squid Error pages ... t/50-mech-status.t ................ 5/5 # Looks like you failed 1 test + of 5. t/50-mech-status.t ................ Dubious, test returned 1 (wstat 25 +6, 0x100) Failed 1/5 subtests t/51-mech-links.t ................. 3/8 # Failed test 'The two links were found' # at t/51-mech-links.t line 71. # got: '4' # expected: '2' # http://somewhere.example/relative # Failed test 'The two links were found' # at t/51-mech-links.t line 71. # got: '4' # expected: '2' # http://somewhere.example/myiframe # Failed test 'The two links were found' # at t/51-mech-links.t line 71. # got: '4' # expected: '2' # http://somewhere.example/myiframe # Failed test 'The two links were found' # at t/51-mech-links.t line 71. # got: '4' # expected: '2' # mailto:root?subject=CacheErrorInfo%20-%20ERR_DNS_FAIL&body=CacheHost +%3A%20proxy.tecnopolis.ca%0D%0AErrPage%3A%20ERR_DNS_FAIL%0D%0AErr%3A% +20%5Bnone%5D%0D%0ADNS%20ErrMsg%3A%20Name%20Error%3A%20The%20domain%20 +name%20does%20not%20exist.%0D%0ATimeStamp%3A%20Fri,%2003%20Dec%202010 +%2006%3A17%3A31%20GMT%0D%0A%0D%0AClientIP%3A%20192.168.100.1%0D%0ASer +verIP%3A%20somewhere.example%0D%0A%0D%0AHTTP%20Request%3A%0D%0AGET%20 +%2Fmyiframe%20HTTP%2F1.1%0AHost%3A%20somewhere.example%0D%0AUser-Agen +t%3A%20Mozilla%2F5.0%20(X11%3B%20U%3B%20Linux%20i686%3B%20en-US%3B%20 +rv%3A1.9.1.15)%20Gecko%2F20101027%20Fedora%2F3.5.15-1.fc12%20Firefox% +2F3.5.15%0D%0AAccept%3A%20text%2Fhtml,application%2Fxhtml+xml,applica +tion%2Fxml%3Bq%3D0.9,*%2F*%3Bq%3D0.8%0D%0AAccept-Language%3A%20en-us, +en%3Bq%3D0.5%0D%0AAccept-Encoding%3A%20gzip,deflate%0D%0AAccept-Chars +et%3A%20ISO-8859-1,utf-8%3Bq%3D0.7,*%3Bq%3D0.7%0D%0AKeep-Alive%3A%203 +00%0D%0AProxy-Connection%3A%20keep-alive%0D%0A%0D%0A%0D%0A t/51-mech-links.t ................. 10/8 # Looks like you planned 8 te +sts but ran 10. # Looks like you failed 4 tests of 10 run. t/51-mech-links.t ................. Dubious, test returned 4 (wstat 10 +24, 0x400) Failed 4/8 subtests t/70-real-status.t ................ 3/8 # Failed test 'We got a good response for a nonexistent domain' # at t/70-real-status.t line 27. # '503' # doesn't match '(?-xism:^(404|500)$)' t/70-real-status.t ................ 6/8 # Failed test 'We got a good response for a nonexistent domain' # at t/70-real-status.t line 33. # '503' # doesn't match '(?-xism:^(404|500)$)' t/70-real-status.t ................ 8/8 # Looks like you failed 2 test +s of 8. t/70-real-status.t ................ Dubious, test returned 2 (wstat 51 +2, 0x200) Failed 2/8 subtests

        And with the proxy on the urlbar.t tests now pass! However, I bet this is just a fluke. I've been seeing some non-deterministic behaviour from the module in my program, where a sequence of gets/saveurls works fine one hour, and barfs out the next. Sometimes in firefox I have to shift-reload a page, clear the cache, restart firefox, restart mozrepl, etc, to get mech to magically start working again. It probably is related to my kludge to exit early from those calls.

Re^2: WWW::Mechanize::Firefox delayed returns / slow
by tcordes (Novice) on Dec 02, 2010 at 04:56 UTC

    Hi! My firefox is 3.5.15, the latest rpm for Fedora 12. I will be switching to F14 in the not-too-distant future and I'm sure that will update me to 3.6. Maybe that will help.

    What local files would MozRepl be trying to load? My firefox is already open and it works fine/normal while browsing hundreds of pages all day long. Hmmm, perhaps because I'm doing the cpan install as root but the firefox is running as my non-root user? I'll investigate that.

    I think your guess about waiting is correct. Like I said, things always load immediately when a mech call is made, but the calls then sit there for a long time before returning.

    Further tests on simple sites indicate that it probably is just this over-the-top js site I am trying to mech that is causing the problems. The simple sites seem to mech just fine.

    Perhaps a "max_wait" option could be added to force mech call returns after X seconds even if they think they still need to wait. Something like "return with what you have at the moment" shortcut.

    I'm pretty good with perl and I could certainly help with coding and testing if I get some hints on where to look. The module and its zillion dependencies taken together are quite a daunting pile for me at the moment.

Re^2: WWW::Mechanize::Firefox delayed returns / slow
by tcordes (Novice) on Dec 02, 2010 at 07:44 UTC

    Success! I debugged and hacked some of the pm code and got the mech to go without hangs or overly long pauses.

    What I did is just a kludge, but it could hint at the real solution.

    I found that the get() was calling synchronize(). Synchronize would be slow (5-20 secs) on the $callback->(). The big problem though is after get()ing a couple of pages the subsequent _wait_while_busy($load_lock) hangs forever.

    So in _wait_while_busy I added before the while loop: my $start=time; and added at the end of the for loop: return $element if time-$start>5;

    Now my program seems to work as expected! Hooray!

    Obviously we're waiting on events that never happen. Again, some global timeout might help. Or, a rejig of this code if this is a true bug (and not just a quirk of the site I'm crawling).

    I can continue to help as required. Thanks!

      ->synchronize waits on the appropriate event to fire in Firefox. If it takes longer than you expect, most likely you're waiting for the wrong event. See the ->events method and the events => argument to the constructor on how to define your own events.

      Just adding a timeout in ->_wait_while_busy only papers over the fact that you're not receiving (or rather, listening to) the right event.

      Now, finding the right event to listen to requires some ingenuity, as the default events (DOMFrameContentLoaded, DOMContentLoaded, error, abort, stop) don't seem to fire "soon enough" in your case. Maybe load is another good event to listen to, but it fires before subframes have loaded.

        My program is very simple. I am not doing any event stuff myself. I'm just doing extremely simple get() and saveurl(). However, I see what you are saying about events now that I've been playing with the module code for a few days.

        I'll see if I can figure out how to add the load event. I'm pretty sure my site has no subframes.

        I've taken to adding Data::Dumper calls to many places in the module to see what's going on. If any of those would help you, let me know. Most of the time the 100 pages of perl code that is Dumped for one variable leaves me dazed & confused. I've taken to restricting it to 2-3 levels deep.

        Thanks, I will try more things and report back!

        I've added a lot of debug code to the module to try various things. Maybe you can help make sense of the results?

        First off, a very strange thing: I put a warn before and after $callback->() in synchronize(). On the very first hit to that code in a run it always takes 30+ secs to execute a dummy callback!!? The output is below. The long number is time(). The $VAR output is Dumper($callback).

        1291361165 zb before cb $VAR1 = sub { "DUMMY" }; 1291361200 zc after cb, before wait

        After that first hit all subsequent $callback->() (which are also all DUMMY) take 0 secs! This doesn't make any sense to me at all. How can a dummy call take 30 secs?

        I've tried adding load to the events (globally). Right before the $load_lock=... in synchronize() I added a Dumper($events) to confirm it's there:

        $VAR1 = [ 'load', 'DOMFrameContentLoaded', 'DOMContentLoaded', 'error', 'abort', 'stop' ];

        I also tried it with just load (no other events). In all cases _wait_while_busy loops until my 20sec timer stops it, so no change there. Any other events I can wait on?

        I also Dumper'd the $element in the for $element loop of _wait_while_busy, but that returns some massive structures that don't mean much to me (you're losing me with all the javascript code). However, if tidbits could help debug, I can post here as required.