Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi. I was wondering if anyone could tell me what's wrong with the following code. I am trying to access multiple web pages simultaneously (using fork) and extract the links which gets added to $returnstring. Everything works fine if I don't use fork but when I do, I get errors. I am using IIS with activeperl 5.60.623. Thanks in advance.

use HTTP::Request::Common; use HTML::Parser; use HTML::TreeBuilder; use LWP::UserAgent; use URI::URL; @longurl = ['http://1.htm', 'http://2.htm', 'http://3.htm' ]; @longfie = ['PostData1', 'PostData2', 'PostData3' ]; $sitetotal=2; my $pid = fork; for ($sitenum=0; $sitenum <= $sitetotal; $sitenum++) { if ($pid = fork) { next; } if (defined $pid) { $ua[$sitenum] = new LWP::UserAgent; $ua[$sitenum]->timeout(4); $res[$sitenum] = $ua[$sitenum]->request(POST $longurl[$sitenum], C +ontent => [ DataField => "$longfie[$sitenum]" ] ); if ($res[$sitenum]->is_success) { $p[$sitenum]->parse($res[$sitenum]->content); } $base = $res[$sitenum]->base; $siten = $longsite[$sitenum]; $p[$sitenum]->traverse(\&extract_alinks, 1); exit; } else { die "Fork failed at number $sitenum: $!\n"; } } $returnstring; sub extract_alinks1 { #extract links and add to $returnstring }

Replies are listed 'Best First'.
Re: Perl fork and http::request
by Corion (Patriarch) on Feb 02, 2001 at 01:37 UTC

    First of all, putting Perl code into <code> ... </code> tags helps immensely with the readability.

    Second, the fork() support in ActiveState Perl and Win32 is marginal at best. It's there, and there are situations in which it works, but implementing fork() on a Win32 platform is mostly done by copying the whole process space, and some things are not copied. So you will have to find alternatives to using fork(), for example nonblocking sockets or in your case maybe LWP::Parallel.

      Thanks. Here is the code again. I looked into LWP::Parallel. It seems it's not supported in ActivePerl on windows platform. Is there another way I can accomplish the same thing?
      use HTTP::Request::Common; use HTML::Parser; use HTML::TreeBuilder; use LWP::UserAgent; use URI::URL; @longurl = ('http://1.htm', 'http://2.htm', 'http://3.htm'); @longfie = ('PostData1', 'PostData2', 'PostData3'); $sitetotal=2; my $pid = fork; for ($sitenum=0; $sitenum <= $sitetotal; $sitenum++) { if ($pid = fork) { next; } if (defined $pid) { $ua[$sitenum] = new LWP::UserAgent; $ua[$sitenum]->timeout(4); $res[$sitenum] = $ua[$sitenum]->request(POST $longurl[$sitenum], C +ontent => [ DataField => "$longfie[$sitenum]" ] ); if ($res[$sitenum]->is_success) { $p[$sitenum]->parse($res[$sitenum]->content); } $base = $res[$sitenum]->base; $siten = $longsite[$sitenum]; $p[$sitenum]->traverse(\&extract_alinks, 1); exit; } else { die "Fork failed at number $sitenum: $!\n"; } } $returnstring; sub extract_alinks1 { #extract links and add to $returnstring }
Re: Perl fork and http::request
by BlueLines (Hermit) on Feb 02, 2001 at 01:37 UTC
    I'm not sure exactly why the code barfs (i don't have a windows machine to play with), but you may want to check out LWP::Parallel, which does web requests in parallel without making you deal with forking.

    BlueLines

    Disclaimer: This post may contain inaccurate information, be habit forming, cause atomic warfare between peaceful countries, speed up male pattern baldness, interfere with your cable reception, exile you from certain third world countries, ruin your marriage, and generally spoil your day. No batteries included, no strings attached, your mileage may vary.
Re: Perl fork and http::request
by Fastolfe (Vicar) on Feb 02, 2001 at 01:50 UTC
    It might also help if you provided us with the errors you are seeing.
      Sorry, that's how you do it....I pasted the code again. Does LWP::Parallel::UserAgent allow me to parse the output, such as extracting links? Is it included with ActivePerl? Couldn't find any information on that. If so, could someone give me an example? Following is the error I get.
      Invoking main::Search error '80004005' fork() is not implemented in PerlScript at (eval 2) line 16. ? error '80004005' Unspecified error /scripts/searchm/search.inc, line 542
      use HTTP::Request::Common; use HTML::Parser; use HTML::TreeBuilder; use LWP::UserAgent; use URI::URL; @longurl = ('http://1.htm', 'http://2.htm', 'http://3.htm'); @longfie = ('PostData1', 'PostData2', 'PostData3'); $sitetotal=2; my $pid = fork; for ($sitenum=0; $sitenum <= $sitetotal; $sitenum++) { if ($pid = fork) { next; } if (defined $pid) { $ua[$sitenum] = new LWP::UserAgent; $ua[$sitenum]->timeout(4); $res[$sitenum] = $ua[$sitenum]->request(POST $longurl[$sitenum], C +ontent => [ DataField => "$longfie[$sitenum]" ] ); if ($res[$sitenum]->is_success) { $p[$sitenum]->parse($res[$sitenum]->content); } $base = $res[$sitenum]->base; $siten = $longsite[$sitenum]; $p[$sitenum]->traverse(\&extract_alinks, 1); exit; } else { die "Fork failed at number $sitenum: $!\n"; } } $returnstring; sub extract_alinks1 { #extract links and add to $returnstring }
        So it looks like your problem is simply the lack of a fork implementation in the version of ActivePerl/"PerlScript" you're using. I believe ActiveState's release of Perl 5.6 includes some emulated fork support that might allow your script to run as you expect, but I still highly recommend using LWP::Parallel for doing parallel web page requests. Click on the link for CPAN documentation. I am not aware if ActivePerl ships with this or not. I suspect it doesn't. Try using PPM to install it. Otherwise, you'll have to build it by hand. Perhaps someone else has more information about this.