I've read in several places that the '.*' construct is not the most efficient way to grab multiple characters because of its greediness, or its first directive to match as much as possible, only matching less when there is a sub-expression further in the regex that is required to match and can't.

That I understand. My question is this:
If there are two of a similar pattern in a string will the '.*' match both of them if the terminating sequence is the same?

For example:
Say the string to matched against is:
<a href="http://www.perlmonks.org">Perlmonks</a> is a great site, so is <a href="http://www.devshed.com">DevShed</a>

And say(for arguments sake, I would not use this regular expression if not for the sake of this question) that the regex used was:

$string =~ m%<a href="([^"]+)">(.*)</a>%;
So I wondered, would the '.*' construct in this case match only the text from the first link tag, or would it match everything from the end of the first URL up to the closing tag of the second link? So that $2 would contain:
Perlmonks</a> is a great site, so is <a href="http://www.devshed.com">DevShed

The answer is yes(I tried it). But when the '.*?' construct is used, only 'Perlmonks' is matched. So the question mark apparently makes the '.*' construct non-greedy.

So is it the optional nature of '?' that makes the '.*' construct non-greedy, or is it something else entirely?

Amel - f.k.a. - kel


In reply to How Greedy is Greedy: A Regex Question by dsb

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.