obeying robot rules

mkurtis has asked for the wisdom of the Perl Monks concerning the following question:

Anyone have any ideas on how to make a crawler obey robots.txt rules? Heres the crawler so far:

#!/usr/bin/perl

use LWP::Simple;
use HTML::SimpleLinkExtor;
use Data::Dumper;

my $content=get("http://www.yahoo.com");
die "get failed" if (!defined $content);
my $extor = HTML::SimpleLinkExtor->new();
$extor->parse($content);

my @links=$extor->a;

foreach $links (@links) {
print "$links\n";
}
print $content;
[download]

Thanks a bunch! _{janitored by ybiC: Balanced <code> tags around codeblock}

Comment on obeying robot rules Download Code

Replies are listed 'Best First'.
•Re: obeying robot rules by merlyn (Sage) on Feb 19, 2004 at 01:33 UTC
Yes. Use WWW::Robot for what it was designed to do. -- Randal L. Schwartz, Perl hacker Be sure to read my standard disclaimer if this is a reply.	[reply]
Re: obeying robot rules by leriksen (Curate) on Feb 19, 2004 at 03:06 UTC
Try also LWP::RobotUA +++++++++++++++++ #!/usr/bin/perl use warnings;use strict;use brain;	[reply]