bliako has asked for the wisdom of the Perl Monks concerning the following question:
Esteemed Monks,
I have the following script which attempts to download 3 urls but gets stuck ($mech->get(url)) on the second. The last trace is 'method' => 'Network.loadingFinished',.
Does anyone have any debugging tips I can use to find why it gets stuck there? Using other urls seems to be working OK.
#!/usr/bin/env perl use strict; use warnings; use Log::Log4perl qw(:easy); use WWW::Mechanize::Chrome; my @urls = ( 'https://zoom.earth/#34.957995,32.299805,5z,sat,am,2018-07-20', 'https://zoom.earth/#34.957995,32.299805,5z,sat,am,2018-07-21', 'https://zoom.earth/#34.957995,32.299805,5z,sat,am,2018-07-22', ); Log::Log4perl->easy_init($TRACE); print "$0 : starting headless chrome ...\n"; my $mech = WWW::Mechanize::Chrome->new( headless => 1, launch_arg => [ '--password-store=basic', '--remote-debugging-port=9223', '--enable-logging', '--disable-gpu', '--no-sandbox', '--ignore-certificate-errors', '--disable-background-networking', '--disable-client-side-phishing-detection', '--disable-component-update', '--disable-hang-monitor', '--disable-save-password-bubble', '--disable-default-apps', '--disable-infobars', '--disable-popup-blocking', '--disable-default-apps', ], ); if( ! defined($mech) ){ print STDERR "$0 : call to ".'WWW::Mechanize:: +Chrome->new()'." has failed.\n"; exit(1) } print "$0 : done, headless chrome is now running.\n"; $mech->add_header('User-agent' => 'Mozilla/5.0 (X11; Linux x86_64) App +leWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.39 Safari/537.36 +'); my $idx = 1; foreach my $aurl (@urls){ my $outfile = "out.$idx.png"; print "$0 : about to get '$aurl'\n"; get_and_shot($mech, $aurl, $outfile, 4) or die "get_and_shot() + : url '$aurl'"; print "$0 : done got '$aurl'\n"; $idx++; } # returns 0 on failure, 1 on success sub get_and_shot { my $mech = $_[0]; my $aurl = $_[1]; my $outfile = $_[2]; my $sleeptime = $_[3] || 2; print 'get_and_shot()'." : entered for url '$aurl'\n"; if( ! defined($mech) ){ print "$0 : mock done\n"; return 1 } print 'get_and_shot()'." : getting url '$aurl'\n"; if( ! $mech->get($aurl) ){ print STDERR "get_and_shot() : call + to ".'get()'." has failed for url '$aurl'.\n"; return 0 } print 'get_and_shot()'." : got OK url '$aurl'.\n"; my $page_png = $mech->content_as_png(); my $fh; if( ! open($fh, '>', $outfile) ){ print STDERR "get_and_shot() + : could not save url '$aurl' to output file '$outfile', $!\n"; retur +n 0 } binmode $fh, ':raw'; print $fh $page_png; close $fh; print 'get_and_shot()'." : saved OK '$aurl' to '$outfile', now + sleeping for $sleeptime seconds ...\n"; sleep($sleeptime); print 'get_and_shot()'." : done, woken up now and exiting sub. +\n"; return 1 # success };
As I said it works find with other urls until it encounters the 2nd or 3rd url from zoomearth, for example setting @urls to :
my @urls = ( 'http://www.ibm.com', 'http://www.ibm.com', 'http://www.ibm.com', 'https://zoom.earth/#34.957995,32.299805,5z,sat,am,2018-07-20', 'https://zoom.earth/#34.957995,32.299805,5z,sat,am,2018-07-21', 'https://zoom.earth/#34.957995,32.299805,5z,sat,am,2018-07-22', );
Will stop after all the ibm's have been fetched and screenshot.
Update 1: even the ibm's sometimes don't work - mech gets stuck on them too sometimes ...
Any ideas?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: WWW::Mechanize::Chrome : gets stuck sometimes
by Corion (Patriarch) on Aug 01, 2018 at 18:17 UTC | |
by bliako (Abbot) on Aug 01, 2018 at 20:39 UTC | |
by Corion (Patriarch) on Aug 02, 2018 at 06:59 UTC | |
by bliako (Abbot) on Aug 02, 2018 at 12:12 UTC | |
|
Re: WWW::Mechanize::Chrome : gets stuck sometimes
by Corion (Patriarch) on Aug 01, 2018 at 13:44 UTC |