Re (tilly) 1: Regexp evaluation

It happens exactly as described in perlre. For the long version pick up Mastering Regular Expressions.

The actual way it works is rather complex because of all of the optimizations, but the naive behaviour that it falls back on is pretty simple. It starts at the beginning of the string and the beginning of the RE. It proceeds through the string and the RE, every time it has to make a choice memorizing that spot. Eventually it probably gets into a dead end (the next character you are looking for is "k" and you saw "m", aw shucks) and then backs up to the last choice it had and goes with the next option it has not tried.

Stop and think about it, it is proceeding left to right in the string and basically left to right in the RE (wildcards can result in looping around in the RE though) in the most obvious manner possible.

Now you may hear that (.*) is greedy, while (.*?) is not. How does that work? Well it is simple. Remember that it has to make choices? Well with either construct it has a choice when it matches a ".". It can try to match another right away, or it can try to proceed. With (.*) it tries to match "." again, with (.*?) it will try to proceed through the RE first. So (.*) will wind up matching as many .'s as it can while still managing to match overall while (.*?) will match as few. (Greedy vs non-greedy.)

Now sit down with perlre and see if you can figure out the idea behind how it is implemented. When you feel comfortable and visit Death to Dot Star! for some of the gotchas. :-)

Comment on Re (tilly) 1: Regexp evaluation