in reply to Cutting Out Previously Visited Web Pages in A Web Spider
sub crawl { my @queue = @_; my %visited; while(my $url = shift @queue) { next if $visited{$url}; my $content = $http_ua->get($url); # do useful things with $content push @queue, $link_extractor->links($content); $visited{$url} = 1; } }
That's all. When size and efficiency start to really matter you will evaluate migrating data to something like Cache::Cache or Berkeley DB.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: Cutting Out Previously Visited Web Pages in A Web Spider
by kappa (Chaplain) on Mar 12, 2004 at 16:59 UTC | |
by mkurtis (Scribe) on Mar 13, 2004 at 01:32 UTC | |
by kappa (Chaplain) on Mar 13, 2004 at 11:02 UTC | |
by mkurtis (Scribe) on Mar 13, 2004 at 18:05 UTC | |
by kappa (Chaplain) on Mar 14, 2004 at 20:51 UTC | |
| |
|
Re: Cutting Out Previously Visited Web Pages in A Web Spider
by mkurtis (Scribe) on Mar 12, 2004 at 03:01 UTC |