in reply to Re: Regex For HTML Image Tags?
in thread Regex For HTML Image Tags?

it is possible in one regular expression though:
$html=~s/<IMG[^>]+?(?:ALT="([^"]*)"[^>]*)?>/"[image".((defined $1)?": +\"$1\"":"")."]"/sgei;
short explanation:
the match starts with "<img" followed by something that is not the end of the tag (but don't be greedy), or it will also match the ALT part which is optional "(?: )?" which should be self-explanatory (with basic perl knowledge)

we then substitute with an expression (the /e modifier)

long explanation:
I am too lazy to write this.

Replies are listed 'Best First'.
Re (tilly) 3: Regex For HTML Image Tags?
by tilly (Archbishop) on Mar 27, 2001 at 17:13 UTC
    Let me see.

    You would match that text inside of attribute values for other tags.

    You fail to consider that the closing > can appear in the values of other attributes for the IMG tag. There are quite a few which could have it.

    The alt attribute may be quoted with "", '', or nothing at all. You only deal with one of these cases.

    There is optional whitespace between ALT and = and = and the value. Not accounted for.

    In my experience the odds of your being bitten are highest for the different delimiter, then for munging up text that appeared in quoted delimiters. The others are possible but unlikely.

    If you know your data, then an RE is OK. I have certainly done that. But if you don't, then an RE hack will break sooner or later...

Re:{3} Regex For HTML Image Tags?
by jeroenes (Priest) on Mar 27, 2001 at 14:21 UTC
    I had been munging on a regex as well (just an exercise), and I think the extended regex clearifies a bit:
    $html=<DATA>; $html =~ s/<IMG \s+ #match the IMG tag SRC \s* = \s* "[^"]+" \s* #match the Source (ALT \s* = \s* "([^"]+)" \s*)? #match an optional Alt > #end of tag /'[image' . ($2 ? ": $2" : '') .']' #print the image stuff /sgixe; print $html; __DATA__ <IMG SRC="foo"><BR> bar bar bar<BR> <IMG SRC="foo" alt="bar">
    This works, but keep in mind that the IMG tag is still valid if for example, the SRC and the ALT are reversed in order.

    That's why HTML::Tokeparser (as Desdinova pointed out already) or maybe even (if the HTML is yours) Template Toolkit are better approaches.

    Cheers,

    Jeroen
    "We are not alone"(FZ)

Re: Re: Re: Regex For HTML Image Tags?
by alfie (Pilgrim) on Mar 27, 2001 at 13:22 UTC
    I knew about that (somewhere, deep hidden in my memories) - but couldn't find it quickly in the manual pages. Strangely it's the first modifier described in the perlop section *hmm*
    Thanks for pointing it out, I simply haven't found it :)
    --
    Alfie