in reply to new patent crawler
//the problem so far is it just outputs the links to the screen, I think it's stored in an array - how can I access each one individually so I can do a get for each one... thanks#!/usr/bin/perl -w # xurl - extract unique, sorted list of links from URL use HTML::LinkExtor; use LWP::Simple; + $base_url = shift; $parser = HTML::LinkExtor->new(undef, $base_url); $parser->parse(get($base_url))->eof; @links = $parser->links; foreach $linkarray (@links) { my @element = @$linkarray; my $elt_type = shift @element; while (@element) { my ($attr_name , $attr_value) = splice(@element, 0, 2); $seen{$attr_value}++; }<br> }<br> for (sort keys %seen) { print $_, "\n" }
Edit by castaway - code tags
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: new patent crawler
by tachyon (Chancellor) on Sep 07, 2004 at 08:39 UTC |