in reply to Re: Using HTTP::LinkExtor to get URL and description info
in thread Using HTTP::LinkExtor to get URL and description info
I was also afraid of overkill, but when you don't need the power, you don't have to use it, and it has made a few things really easy compared to what I could do with other tools.
Btw, I think your code could be improved in this way:
use HTML::TreeBuilder; use strict; # examples aren't exempt!!! my $parser = new HTML::TreeBuilder; $parser->parse($html_code_from_elsewhere); my @links = $parser->look_down('_tag' => 'a'); foreach my $link (@links) { my $href = $link->attr('href'); my $descr = $link->as_text(); } $parser->delete();
This removes the assumption about only simple text contents and only gets text from the anchor element. Your code would have gotten markup elements embedded in the anchor element, like:
<a href="..."><p class="big-and-bold">Winners!</p> for today</a>
Fetching $link->content[0] on the above would get you an HTML::Element.
I know you pointed out this limitation, but I think the original Seeker might like to have the as_text() method pointed out as extracting text from HTML appears to be the thing of interest.
|
|---|