in reply to new patent crawler

#!/usr/bin/perl -w # xurl - extract unique, sorted list of links from URL use HTML::LinkExtor; use LWP::Simple; + $base_url = shift; $parser = HTML::LinkExtor->new(undef, $base_url); $parser->parse(get($base_url))->eof; @links = $parser->links; foreach $linkarray (@links) { my @element = @$linkarray; my $elt_type = shift @element; while (@element) { my ($attr_name , $attr_value) = splice(@element, 0, 2); $seen{$attr_value}++; }<br> }<br> for (sort keys %seen) { print $_, "\n" }
//the problem so far is it just outputs the links to the screen, I think it's stored in an array - how can I access each one individually so I can do a get for each one... thanks

Edit by castaway - code tags

Replies are listed 'Best First'.
Re^2: new patent crawler
by tachyon (Chancellor) on Sep 07, 2004 at 08:39 UTC

    Did you try anything? You are looping over the links and printing them! They are stored as the keys of a hash. This has been done to remove duplicates. If you want and arrary you could just do @links = keys %seen;

    But as you are *already* looping over them did it cross your mind to get them as well? Here I am assigning to $link instead of using the sefault assignment to $_ for clarity of code.....

    for my $link (sort keys %seen) { print "Getting $link....."; my $html = get($link); if ( $html =~ m/whatever/ ) { print "Wohoo!\n"; } else { print "Bugger\n"; } }

    cheers

    tachyon