It happens exactly as described in perlre. For the long
version pick up Mastering Regular Expressions.
The actual way it works is rather complex because of all of
the optimizations, but the naive behaviour that it falls
back on is pretty simple. It starts at the beginning of
the string and the beginning of the RE. It proceeds through
the string and the RE, every time it has to make a choice
memorizing that spot. Eventually it probably gets into a
dead end (the next character you are looking for is "k" and
you saw "m", aw shucks) and then backs up to the last choice
it had and goes with the next option it has not tried.
Stop and think about it, it is proceeding left to right in
the string and basically left to right in the RE (wildcards
can result in looping around in the RE though) in the most
obvious manner possible.
Now you may hear that (.*) is greedy, while (.*?) is not.
How does that work? Well it is simple. Remember that it
has to make choices? Well with either construct it has a
choice when it matches a ".". It can try to match another
right away, or it can try to proceed. With (.*) it tries
to match "." again, with (.*?) it will try to proceed
through the RE first. So (.*) will wind up matching as
many .'s as it can while still managing to match overall
while (.*?) will match as few. (Greedy vs non-greedy.)
Now sit down with perlre and see if you can figure out
the idea behind how it is implemented. When you feel
comfortable and visit
Death to Dot Star! for some of the gotchas. :-) |