in reply to Using LWP::Parallel

This is not an answer to your question. Rather it is a comment for everyone who answered it without noticing this very important detail.

What you have written is a robot. It is very bad netiquitte to not look for and respect robots.txt. Any time anyone asks a question where it is clear from their code that they have written a robot that doesn't do this, please make sure to bring up robots.txt, and point to WWW::RobotRules. (Which comes with LWP.)

Replies are listed 'Best First'.
Re: Re (tilly) 1: Using LWP::Parallel
by hossman (Prior) on Mar 09, 2002 at 04:04 UTC
    Given that:
    • The User agent is "OpticDB LinkCheck/0.1"
    • The list of links pinged is stored in a Database
    • The pages aren't being scraped for more links
    The assumption that it's a Robot seems misplaced. Seems more like a site analyzer to me. (ie: check that 'important' urls are working.)
      Thank you for all your replies! After some more help from c.l.p.m, I've managed to reach thus far (code below). However, some questions remains unanswered, ie, the need to limit the number of parallel connections.

      And for the record, this is just a script to check if the links I have in a database are correct, ie, I'm checking for a return status code of 200, and if not, grab the status code so I can delete them from the database.

      sub check_links_results { print $query->header; use LWP::Parallel::UserAgent qw(:CALLBACK); my $ua = LWP::Parallel::UserAgent->new; $ua->nonblock(1); $ua->agent("OpticDB LinkCheck/0.1"); connect_to_db(); my $clock_start = time(); $sth = $dbh->prepare("SELECT url_en,id FROM $DB_MYSQL_NAME"); $sth->execute (); my %ids; while( my ($url, $id) = $sth->fetchrow_array ) { $ids{$url} = $id; $ua->register(HTTP::Request->new(GET => $url)); } $sth->finish; $dbh->disconnect; my $responses = $ua->wait; my $clock_finish = time - $clock_start; # end timer + and compare $time_taken = sprintf ("%.2f", $clock_finish); # trim time +to 2 decimal points my ($count, $htmlout) = (0, ""); while( (undef, my $entry) = each %$responses ) { my $req = $entry->request; my $res = $entry->response; my $id = $ids{$req->url}; next if $res->code == 200; ++$count; $tmpl_show_record .= qq| <table width="95%" border="0" cellspacing="0" cell +padding="2"> <tr> <td width="2%" align="middle">&nbsp;</td> <td width="6%" bgcolor="#EEEECC" align="right" val +ign="top"><font face="Arial, Helvetica, sans-serif" size="2">$ref->{i +d}</font>&nbsp;</td> <td width="58%" bgcolor="#E9EBEF">&nbsp;<font face +="Arial, Helvetica, sans-serif" size="2">$ref->{'name_en'}</font></td +> <td width="20%" bgcolor="#FFDDDD">&nbsp;<font face +="Arial, Helvetica, sans-serif" size="2">$res_code : $res_msg</font>< +/td> <td width="14%" bgcolor="#EEEECC" valign="top" ali +gn="center"><a href="odb.cgi?action=edit_record&id=$ref->{'id'}"><img + src="/images/icons/edit.gif" width="15" height="15" alt="[ edit ]" b +order="0"></a> &nbsp; <a href="odb.cgi?action=del_record&id=$ref->{'id'} +" onClick="return confirm('Delete record $ref->{'id'}?')"><img src="/ +images/icons/delete.gif" width="15" height="15" alt="[ delete ]" bord +er="0"></a> &nbsp; <a href="odb.cgi?action=toggle_live&id=$ref->{'id' +}"> |; if ($data_status eq "Live") { $tmpl_show_record .= "<img src=\"/images/icons/liv +eyes.gif\" border=\"0\">"; } else { $tmpl_show_record .= "<img src=\"/images/icons/liv +eno.gif\" border=\"0\">"; } $tmpl_show_record .= qq| </a> </td> </tr> </table> <BR> |; } $num_dead = $count; if( $count == 0 ) { &error_html("No dead links found!"); exit; } $dbh->disconnect; &parse_template("$PATH_TEMPLATE/check_links_results.tmpl"); }


      --
      Wiliam Stephens <wil@stephens.org>

        And for the record, this is just a script to check if the links I have in a database are correct, ie, I'm checking for a return status code of 200, and if not, grab the status code so I can delete them from the database.

        Not only 200 indicates a succesful state. Use $res->is_success instead ($res being your HTTP::Response object)

        44696420796F7520732F2F2F65206F
        7220756E7061636B3F202F6D736720
        6D6521203A29202D2D204A75657264