wojtyk has asked for the wisdom of the Perl Monks concerning the following question:
<a href = ...><img src= ...></img></a>
Essentially, I want all links that have an image as the "text" of an href. However, the crucial thing I need to know the src location of that image. All existing CPAN modules seem to toss that data away when they "textify" the link (the important img data inside the href is replaced with a useless "[IMG]" tag)
I've tinkered/experimented with everything from HTTP::Mechanize to Link::Extor to HTML::Tree to HTML::Parser to at least a dozen other things. I can't get anything to work right that doesn't textify first.
I really didn't want to homegrow a regex, but I'm running out of options.
Does anyone know the best way to do this?
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Extracting full links from HTML
by GrandFather (Saint) on Feb 02, 2007 at 10:26 UTC | |
Re: Extracting full links from HTML
by wfsp (Abbot) on Feb 02, 2007 at 10:53 UTC | |
Re: Extracting full links from HTML
by smahesh (Pilgrim) on Feb 02, 2007 at 10:12 UTC | |
Re: Extracting full links from HTML
by mirod (Canon) on Feb 02, 2007 at 10:53 UTC | |
Re: Extracting full links from HTML
by Scott7477 (Chaplain) on Feb 02, 2007 at 18:17 UTC | |
Re: Extracting full links from HTML
by Anonymous Monk on Feb 02, 2007 at 10:14 UTC | |
Re: Extracting full links from HTML
by OfficeLinebacker (Chaplain) on Feb 03, 2007 at 19:11 UTC | |
by wojtyk (Friar) on Feb 05, 2007 at 16:43 UTC | |
by Anonymous Monk on Sep 06, 2007 at 21:38 UTC |