Is it possible to get LWP::Protocol::collect to bite off larger amounts of data? Turning on LWP::Debug qw(+);, I get responses like:
LWP::Protocol::collect: read 1418 bytes LWP::Protocol::collect: read 1418 bytes LWP::Protocol::collect: read 1418 bytes

It's very, very slow when fetching a page to take such a small 'nibble'. Is this server-side dependant? Can I increase that on my end, programatically, to speed things up? (changing the MTU didn't help, but it was worth a shot =). This isn't terribly important, but the next issue is..

I'm also using Parallel::ForkManager to spawn fetchers, and I've got 10 running concurrently:

my $pm = Parallel::ForkManager->new(10);

I notice that 90% or more of the script's execution time is spend inside the wait() portion of ForkManager. Why does it sit there so long, blocking on forked children?

My code looks roughly similar to this, and works, but now seems horribly slow, blocking on children. I recently moved from using arrays to store the data and links, to hashes (thanks Corion and jeffa), and at that point, I noticed things slowing down considerably. I see about 2,000 wait() events for every 1 fetch event from the children:

fetch_content(@urls); $pm->wait_all_children; ## Run when children are forked $pm->run_on_start( sub { my ($pid, $link) = @_; $link =~ s/\s+$//; ## Count the pages fetched thus far $pagecount++; } ); ## Run when child processes complete $pm->run_on_finish( sub { my ($pid, $exit_code, $ident) = @_; print "\n** $ident out of the pool ". "with PID $pid and exit code: $exit_code\n"; } ); ## Run when blocking/waiting for children $pm->run_on_wait( sub { print "-"x74, "\n"; print "\n** Waiting for child ...\n"; print "-"x74, "\n"; }, 0.1 ); ## Fetch the actual page and links sub fetch_content { my @urls = @_; for my $link (@urls) { my $pid = $pm->start($link) and next; # fetch the page, extract the links # (all of the fetching/extraction works) $pm->finish; } }

In reply to Biting off more with LWP and problems with blocking forks()? by hacker

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.