jonjacobmoon has asked for the wisdom of the Perl Monks concerning the following question:
I am trying to use HTML::Parser to extract some very specific links from a page but I also need to know what the text is from each link.
So, if a link is Foo, I want to be able to know that "Foo" corresponds to the url is links to. This has got to be easy, but it is late and my brain is numb.
I am using HTML::Parse to get the tags. Here is some code:
#!/usr/bin/perl use strict; use lib "/home/jon/perl"; # where BrowserEmulator is use HTML::Parser; use BrowserEmulator; # this gets all the text from the page my @SSNB; # start of ParseLink { package ParseLink; our @ISA = qw(HTML::Parser); # called by parse sub start { my ($this, $tag, $attr) = @_; if ($tag eq "a") { $this->{links}{$attr->{href}} = 1; } } sub get_links { my $this = shift; return keys %{$this->{links}}; } } my $test_url = shift; my $string = &BrowserEmulator::getFullSource($test_url); my $p = ParseLink->new; $p->parse($string); for ($p->get_links) { print "LINK: $_\n"; }
|
---|
Replies are listed 'Best First'. | |
---|---|
(crazyinsomniac) Re: Getting the Linking Text from a page
by crazyinsomniac (Prior) on Mar 13, 2002 at 08:18 UTC | |
by jonjacobmoon (Pilgrim) on Mar 13, 2002 at 10:35 UTC | |
Re: Getting the Linking Text from a page
by Corion (Patriarch) on Mar 13, 2002 at 08:31 UTC | |
Re: Getting the Linking Text from a page
by gellyfish (Monsignor) on Mar 13, 2002 at 09:15 UTC |