Mad_Mac has asked for the wisdom of the Perl Monks concerning the following question:
I have a list of 10,000~ URLS, stored in a hash with a user friendly name, that I need to retrieve from a webserver, for local parsing and analysis.
My code seems to have a memory leak, and eventually cannot fork because it runs out of resources.
Here's the relevant bit of my code:
$curcount=0; $url_count = keys %url_list; my $pm = new Parallel::ForkManager(100); foreach $url (keys %url_list) { $curcount++; my $fname = $url_list{$url}; printf STDERR ("\r%02d ($fname) of $url_count files retrieved.", $curcount); $pm->start and next; getstore($url,$fname) or die 'Failed to get page'; $pm->finish; } $pm->wait_all_children;
I thought of trying to use LWP::Parallel, but it doesn't want to install on my system. If it matters, I am doing this with Strawberry Perl in a Win x32 VM on a Linux Mint X64 host. I'm not sure exactly which version of Perl. The msi from Strawberrys site says 5.12.1, but perl -ver says 5.10.1. The host has 8GB RAM, and I have allocated 4GB to the Win7 VM. The Perl process starts out using ~600 MB and creeps up to ~2GB before it crashes (sometimes sooner).
So, my questions are:
Thanks
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Get 10,000 web pages fast
by BrowserUk (Patriarch) on Jun 17, 2010 at 12:58 UTC | |
by Mad_Mac (Beadle) on Sep 28, 2010 at 11:49 UTC | |
by BrowserUk (Patriarch) on Sep 28, 2010 at 15:13 UTC | |
Re: Get 10,000 web pages fast
by Anonymous Monk on Jun 17, 2010 at 12:13 UTC | |
Re: Get 10,000 web pages fast
by aquarium (Curate) on Jun 18, 2010 at 04:03 UTC | |
Re: Get 10,000 web pages fast
by pemungkah (Priest) on Jun 19, 2010 at 02:15 UTC |