in reply to Extracting full links from HTML

I'm with Grandpa on this one. I've used HTML::TreeBuilder with good results. The HTML::Elements will have all of the attribs. You can look for all elements in a tree that have a 'src' attrib, all links, whatever.

I like computer programming because it's like Legos for the mind.

Replies are listed 'Best First'.
Re^2: Extracting full links from HTML
by wojtyk (Friar) on Feb 05, 2007 at 16:43 UTC
    TreeBuilder is actually what I ended up doing, but it really felt and looked ugly. I'm really shocked Mechanize doesn't already do this already in its link extraction. I mean a Mechanize::Image class exists. There's no reason to "textify" anything when you could just include a reference to a Mechanize::Image object. Thanks though everybody :)
      I agree with most of you. But what will you do? when the image in an input tag like : <input name="image1" type="image" src="images/go.gif" align="middle" width="25" height="19" border="0"> How to extract such images?