in reply to Regexp to extract HTML link data

Under the blind assumption that your data won't be changing too much or becomes 'faulty' (otherwise you'd be using a parser right?) then something like this ought do
my $re = qr{ (?: <img \s+ .*? src=" ([^"]+) " .*? > )? <a \s+ .*? href=" ([^"]+) " .*? > }x; $in = '<td><img src="foo.jpg"><a href="index3.html">New index</a></td>'; my($href, $img) = grep defined, reverse $in =~ $re; print "href - $href\nimg - $img\n"; $in = '<td><a href="index3.html">New index</a></td>'; ($href, $img) = grep defined, reverse $in =~ $re; print "href - $href\nimg - $img\n"; __output__ href - index3.html img - foo.jpg href - index3.html img -
See. perlre for more info.
HTH

_________
broquaint

Replies are listed 'Best First'.
Re: Re: Regexp riddles
by hatter (Pilgrim) on Jul 17, 2003 at 14:06 UTC
    Thanks, that looks like the ticket. And your assumptions are correct - HTML parsers, um, no thank you. The input happens to be HTML, but it's very simple, fairly fixed format, and the problem could just as easily be expressed without HTML tags. And I'm hoping to wrap it all up in a map() (lots of data to iterate over) so it's much neater.

    Now, off to spend more time staring hard at the solution until its lessons burn themselves deep into my brain.

    thanks

    the hatter