giantpanda has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I'm currently coding a web crawler and I've been using WWW::Mechanize::Firefox to extract data from pages that keep loading content via JavaScript. Here's the code for this part of the script:

use strict; use warnings; use WWW::Mechanize::Firefox; # Requires MozRepl addon for Firefox use WWW::Mechanize; use Date::Manip; use DateTime; open TXT, "<", "nickname.log"; while(! eof (TXT)){ my $nick = <TXT>; chomp $nick; my $j = 0; for $j(0..3){ open MATCH, "+<", "match.log"; my (@gotmatch) = <MATCH>; my ($date) = DateTime->now; my ($k) = 0; while($k < $j){ $date = $date->subtract( days => 7); $k++; } $date = $date->ymd; $url = "http://www.quakelive.com/#profile/matches/$nick/$date" +; my ($firemech) = WWW::Mechanize::Firefox->new(); $firemech->get($url); die "Cannot connect to $url\n" if !$firemech->success(); my ($retries) = 10; while ($retries-- and ! $firemech->is_visible( xpath => '/ +/*[@class="areaMapC"]' )) { sleep 1; } die "Timeout" unless $retries; my ($content) = $firemech->content(); while(($content =~ /class="areaMapC" id="([^<]+)_([^<]+)_([^<] ++)">/gsi)){ my ($game) = $1; my ($longid) = $2; my ($shortid) = $3; my ($matchid) = "$longid/$game/$shortid\n"; # Checks match.log for duplicates my ($flag) = 0; for my $l(0..$#gotmatch){ if($matchid eq $gotmatch[$l]){ $flag = 1; } } if($flag == 0){ print MATCH $matchid; $flag = 0; } } close MATCH; undef $firemech; $j++; } }

The rest of the script (using WWW::Mechanize) runs perfectly, but this part breaks it giving these errors:

(in cleanup) Can't call method "cmd" on an undefined value at /Library +/Perl/5.10.0/MozRepl/Client.pm line 186,<DATA> line 42 during global + destruction. (in cleanup) Can't call method "execute" on an undefined value at +/Library/Perl/5.10.0/MozRepl.pm line 372,<DATA> line 42 during globa +l destruction. (in cleanup) Can't call method "is_debug" on an undefined value at + /Library/Perl/5.10.0/MozRepl/Client.pm line 188,<DATA> line 42 duri +ng global destruction.

Note that it sometimes gives only one of them, sometimes all of them and that "is_debug" is the most rare of them. Also note that the username they pop out is different every run (sometimes the 6th, others the 15th, etc).

On StackOverflow I've been told to "undef $firemech", but this didn't solve the issue. Google hasn't been helpful either so far, nor has been my Perl professor.

Thanks.

Replies are listed 'Best First'.
Re: MozRepl cleanup problem
by Corion (Patriarch) on Oct 30, 2010 at 12:26 UTC

    These messages come from when the global destruction of objects happens. The global destruction does not respect the usual order of object destruction anymore, and in your case, this means that sometimes there are Javascript proxy objects still alive while the bridge to Firefox has gone down already.

    I told you on StackOverflow to undef $mech. If that doesn't solve your problems, maybe you keep other references into Firefox. You need to break these references before starting global destruction as well.

    Update: It seems that you're creating a WWW::Mechanize::Firefox object over and over again in a loop. Most likely, you'll be better off by creating your object outside of your loop - at least that should be faster as the initialization within Firefox then only needs to happen once.

      First of all, thanks for the fast reply.

      I tried moving the creation of $firemech outside the loop, but this resulted in making the script hang after changing to the second page. To explain further, it connected to the first page, got everything I asked for, then it connected to the second one (could see it in Firefox), but nothing happened. I waited something like 5 minutes, it didn't even timeout.

      Could you explain again what you mean by "maybe you keep other references into Firefox"? I didn't understand what should I break of all the stuff I use for those cycles.

        All the data you pull out of Firefox as HTML elements (through the ->xpath method or the ->selector method, for example), keeps the bridge into Firefox alive. So you need to make sure you don't have any more such objects lying around when you want a clean exit (or you just ignore the warning messages).

        I'm not sure why the script would hang at the second round. Maybe the site does not load some data it loads on the first round. You'll have to debug the behaviour of the site when automating it.