Re: Re: Using HTTP::LinkExtor to get URL and description info

I've been doing some web automation and I'm using HTML::TreeBuilder everywhere.

I was also afraid of overkill, but when you don't need the power, you don't have to use it, and it has made a few things really easy compared to what I could do with other tools.

Btw, I think your code could be improved in this way:

use HTML::TreeBuilder;
use strict;  # examples aren't exempt!!!

my $parser = new HTML::TreeBuilder;
$parser->parse($html_code_from_elsewhere);

my @links = $parser->look_down('_tag' => 'a');
foreach my $link (@links) {
   my $href = $link->attr('href');
   my $descr = $link->as_text();  
}
$parser->delete();
[download]

This removes the assumption about only simple text contents and only gets text from the anchor element. Your code would have gotten markup elements embedded in the anchor element, like:

<a href="..."><p class="big-and-bold">Winners!</p> for today</a>
[download]

Fetching $link->content[0] on the above would get you an HTML::Element.

I know you pointed out this limitation, but I think the original Seeker might like to have the as_text() method pointed out as extracting text from HTML appears to be the thing of interest.

Comment on Re: Re: Using HTTP::LinkExtor to get URL and description info Select or Download Code