Anyone have any ideas on how to make a crawler obey robots.txt rules? Heres the crawler so far:
Thanks a bunch!#!/usr/bin/perl use LWP::Simple; use HTML::SimpleLinkExtor; use Data::Dumper; my $content=get("http://www.yahoo.com"); die "get failed" if (!defined $content); my $extor = HTML::SimpleLinkExtor->new(); $extor->parse($content); my @links=$extor->a; foreach $links (@links) { print "$links\n"; } print $content;
janitored by ybiC: Balanced <code> tags around codeblock
In reply to obeying robot rules by mkurtis
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |