mkurtis has asked for the wisdom of the Perl Monks concerning the following question:

Anyone have any ideas on how to make a crawler obey robots.txt rules? Heres the crawler so far:

#!/usr/bin/perl use LWP::Simple; use HTML::SimpleLinkExtor; use Data::Dumper; my $content=get("http://www.yahoo.com"); die "get failed" if (!defined $content); my $extor = HTML::SimpleLinkExtor->new(); $extor->parse($content); my @links=$extor->a; foreach $links (@links) { print "$links\n"; } print $content;
Thanks a bunch!

janitored by ybiC: Balanced <code> tags around codeblock

Replies are listed 'Best First'.
•Re: obeying robot rules
by merlyn (Sage) on Feb 19, 2004 at 01:33 UTC
Re: obeying robot rules
by leriksen (Curate) on Feb 19, 2004 at 03:06 UTC
    Try also LWP::RobotUA

    +++++++++++++++++
    #!/usr/bin/perl
    use warnings;use strict;use brain;