in reply to lovely regexs

Perhaps change the regular expression to turn off the default "greediness" of the "*", with a "?" quantifier, so that it gathers only up to the next quote character.
$moo =~ m/src="(.*?)"/;
However, I normally use HTML::TreeBuilder to parse and search html.

Replies are listed 'Best First'.
Re^2: lovely regexs
by dsheroh (Monsignor) on Apr 12, 2009 at 22:20 UTC
    Ignoring, for the moment, the wisdom of using the proper tool (which is generally not a regex) for parsing HTML...

    The issue here is not greediness. The issue is the misuse of ".*". Making the "*" non-greedy is just a band-aid which masks the fact that ".*" says "match any number of any characters", when what you actually mean is "match any number of any non-double quote characters". The correct way to write that regex is:

    $moo =~ m/src="([^"]*)"/;

    The non-greedy qualifier does have its legitimate uses, generally in cases where your target is terminated by a sequence of multiple characters. In cases where a negated character class can do the job, though, the character class will almost always be the better option.