I have a list of 10,000~ URLS, stored in a hash with a user friendly name, that I need to retrieve from a webserver, for local parsing and analysis.
My code seems to have a memory leak, and eventually cannot fork because it runs out of resources.
Here's the relevant bit of my code:
$curcount=0; $url_count = keys %url_list; my $pm = new Parallel::ForkManager(100); foreach $url (keys %url_list) { $curcount++; my $fname = $url_list{$url}; printf STDERR ("\r%02d ($fname) of $url_count files retrieved.", $curcount); $pm->start and next; getstore($url,$fname) or die 'Failed to get page'; $pm->finish; } $pm->wait_all_children;
I thought of trying to use LWP::Parallel, but it doesn't want to install on my system. If it matters, I am doing this with Strawberry Perl in a Win x32 VM on a Linux Mint X64 host. I'm not sure exactly which version of Perl. The msi from Strawberrys site says 5.12.1, but perl -ver says 5.10.1. The host has 8GB RAM, and I have allocated 4GB to the Win7 VM. The Perl process starts out using ~600 MB and creeps up to ~2GB before it crashes (sometimes sooner).
So, my questions are:
Thanks
In reply to Get 10,000 web pages fast by Mad_Mac
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |