CodeJunkie has asked for the wisdom of the Perl Monks concerning the following question:

Hi,
I am having problems with LWP::Parallel::UserAgent. I have written a parallel web robot for surveying the internet which works quite well. The trouble is it keeps falling over. I think this is a problem with the UserAgent module. The error I get is:

Undefined subroutine &LWP::Parallel::UserAgent::_new_response called a +t /usr/lib/perl5/site_perl/5.6.0/LWP/Parallel/UserAgent.pm line 1447.

It seems to have problems with certain URLS, from what I can gather it seems to be certain URLS that have META redirects, (but I could be wrong). I can't find the subroutine _new_response defined anywhere in the module, but I don't know much about modules and I don't want to start hacking around with it when I don't know what i'm doing! ;-)

So does anyone have any ideas about what I should do to stop my robot crashing out???

Many thanks,
Tom

Replies are listed 'Best First'.
Re: Problem with LWP::Parallel::UserAgent
by webfiend (Vicar) on Mar 10, 2003 at 18:00 UTC

    It's been a while since I used LWP::Parallel, but let me see if I can be of a more general help. _new_response is a method of LWP::UserAgent. Are you remembering to create a LWP::Parallel::UserAgent object, or trying to access the methods without an object? That could cause some confusion.

    If it isn't something easy like that, you might want to post your code, so some of the other folks here can take a peek at it.


    I just realized that I was using the same sig for nearly three years.

      Ok this is my (i'm sure highly inefficient and embarrisingly limited) perl code for a web robot;) I have missed lots and lots out to try not to bore people with the details. It works fine for ages, but just crashes out when looking up certain urls... weird.

      #!/usr/bin/perl -w ######################################################## # Program written by Tom Freeman 2003 # ----------------------------------- # Purpose: To collect information about the technologies # being used on websites, particularly client side # technologies. As well as the server the sites are using. # # Improvements # v1.8: Added support for the META exclusion protocol # i.e. <META name="ROBOTS" content="NOINDEX, NOFOLLOW"> # # v3.3: now retains a list of new domains on disk in # domains.dat file # ######################################################## require LWP::Parallel::UserAgent; use HTTP::Request; #use LWP::Debug qw(+); # display tons of debugging messages. See 'perl +doc LWP::Debug' use URI; use URI::URL; use HTML::LinkExtor; use LWP::Simple qw(get); use Time::SoFar qw( runtime runinterval figuretimes ); use Fcntl; use DBI; my $domainsfile = 'domains.dat'; ####### VARIABLES ###################################### my $pua = LWP::Parallel::UserAgent->new(); $pua->in_order (1); # handle requests in order of registration $pua->duplicates(0); # ignore duplicates $pua->timeout (2); # in seconds $pua->redirect (1); # follow redirects my @domains=(); push @domains, 'http://www.bbc.co.uk'; #load first value onto the doma +ins array my $pass_counter = '0'; $num_domains = scalar @domains; while ($num_domains > '0') { # Start Loopy Code ######## Load up the old domains file from disk ###### open DOMAINSFILE, "< $domainsfile" || print "Failed to open logfile +for writing: $!"; @domains = (<DOMAINSFILE>); close(DOMAINSFILE); ###################################################### $pass_counter++; my @selected_domains=(); while (scalar @selected_domains <= 10) { my $new_domain = shift @domains; my $visited = &checkExists($new_domain); if (!$visited) { push @selected_domains, $new_domain; } } my $reqs = [ HTTP::Request->new('GET', $selected_domains[0]), HTTP::Request->new('GET', $selected_domains[1]), HTTP::Request->new('GET', $selected_domains[2]), HTTP::Request->new('GET', $selected_domains[3]), HTTP::Request->new('GET', $selected_domains[4]), HTTP::Request->new('GET', $selected_domains[5]), HTTP::Request->new('GET', $selected_domains[6]), HTTP::Request->new('GET', $selected_domains[7]), HTTP::Request->new('GET', $selected_domains[8]), HTTP::Request->new('GET', $selected_domains[9]), ]; foreach my $req (@$reqs) { if ($req->url) { print "Registering ".$req->url."\n"; if ( my $res = $pua->register ($req) ) { print STDERR $res->error_as_HTML; } } } $entries = $pua->wait(); foreach (keys %$entries) { my $res = $entries->{$_}->response; my $domain = $res->request->url; if ($res->is_success) { # Do lots of interesting stuff with the # content and write out to the database } } $num_domains = scalar @domains; # get the size of array ######### Write out the new domains file ###### open DOMAINSFILE, "> $domainsfile" || print "Failed to open logfile +for writing: $!"; foreach my $domain (@domains) { print DOMAINSFILE "$domain\n"; } close(DOMAINSFILE); ############################################### (@domains)=(); undef %$entries; }

      I hope this isn't too much code to annoy people, i'd be interested to hear any of your thoughts, ideas and comments?

      Many Thanks,
      Tom