mkurtis has asked for the wisdom of the Perl Monks concerning the following question:
Anyone have any ideas on how to make a crawler obey robots.txt rules? Heres the crawler so far:
Thanks a bunch!#!/usr/bin/perl use LWP::Simple; use HTML::SimpleLinkExtor; use Data::Dumper; my $content=get("http://www.yahoo.com"); die "get failed" if (!defined $content); my $extor = HTML::SimpleLinkExtor->new(); $extor->parse($content); my @links=$extor->a; foreach $links (@links) { print "$links\n"; } print $content;
janitored by ybiC: Balanced <code> tags around codeblock
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
•Re: obeying robot rules
by merlyn (Sage) on Feb 19, 2004 at 01:33 UTC | |
|
Re: obeying robot rules
by leriksen (Curate) on Feb 19, 2004 at 03:06 UTC |