jplan34 has asked for the wisdom of the Perl Monks concerning the following question:

Here be my code:
_________________________
foreach my $sid (@sids) { $sth = Database::query("select SUrl from Sites where SId = '$sid'", +$dbh); my ( $site ) = $sth->fetchrow_array(); push @reqs, HTTP::Request->new(GET => $site); push @url_list, $site; $sth->finish; } my ($req,$res); #Creating a new LWP UserAgent my $pua = LWP::Parallel::UserAgent->new(); # sets the maximum number of requests, and the maximum number of host # allowed at the same time to 15 # set the UserAgent to connect to requests in order. $pua->max_hosts(15); $pua->max_req(15); $pua->in_order(1); # Registers the requests so the useragent knows which websites to go t +o. foreach my $req (@reqs) { $pua->register ($req); } # executes requests my $entries = $pua->wait(); my $counter = 0; foreach (keys %$entries) { my $res = $entries->{$_}->response; my $content = $res->content; my $url = @url_list[$counter]; # ... update the copy of the site content in Sites.SLatestVersion my $sth = $dbh->prepare("update Sites set SLatestVersion = ?,SLas +tUpdated = NOW() where SUrl = ?", $dbh) or die "Couldn't prepare stat +ement!" . $dbh-> $sth->execute( $content, $url ); $sth->finish; # retrieve site ID for that specific webpage $sth = Database::query("select SId from Sites where SUrl = '$url' +", $dbh); my $sid = $sth->fetchrow_array; $sth->finish; # set DDiffExists to zero for all entries in the Docs table for t +his SId $sth = Database::query("update Docs set DDiffExists = '0' where D +TagSId = '$sid'", $dbh); $counter++; }
_____________________________
The problem is that the parallel lwp remaps the url. e.g. at the start the url is http://www.toonami.com but by the end the url has become http://www.cartoonnetwork.com/toonami/. How do I find out the original url? (or some other solution that gets around the problem)

Replies are listed 'Best First'.
Re: Parallel LWP
by nardo (Friar) on Apr 19, 2001 at 08:24 UTC
    I don't know of any way to get back the original url, but if you don't want to automatically follow redirects, do a $pua->redirect(0). When you get a response, $res->code will contain the HTTP status code, if it is in the 300-399 range, you should check for a Location header and handle the redirection in your code (making sure that your code remembers what the original url was).
Re: Parallel LWP
by xunker (Beadle) on Mar 22, 2002 at 19:59 UTC

    How do I find out the original url?</blcckquote>

    I have been doing some work with LWP::Parallel lately and needed to do the same thing -- this is not a pretty way, but after a few hours of muching around with it, I found out this works:

    my $entries = $pua->wait(); foreach my $entry (keys %$entries) { my $res = $entries->{$entry}->response; my $prev = $res->previous; my $prev_req = $prev->request; my $original_url = $prev_req->uri; # this is where it it }

    Undoubtedly there are way to streamline this, etc, but it does work -- for me at least. Now if I could just figure out the "Out of Memory!" errors I've been getting...