Malkavian has asked for the wisdom of the Perl Monks concerning the following question:

I was wondering about the evaluation order of Perl Regexps.
Do they evaluate the expression from left to right (i.e. do they check for completion of match in the order you specify the expression?), or, does the regexp bind from anchored matches through to unanchored matches?
As this could help me structure regexps better, I'd love to know the answer..
Thanks,

Malk.

Replies are listed 'Best First'.
Re: Regexp evaluation
by chromatic (Archbishop) on Oct 14, 2000 at 04:05 UTC
    There are a couple of general rules I use to keep track of things.
    • The engine prefers the leftmost, longest match. If it can match something early in the string or late in the string, it prefers the former.
    • Using an anchor, obviously, changes this.
    • The default quantifiers (* and +) are greedy. They prefer longer matches, and will happily gobble up everything in the string.
    • All of these rules apply to each element of the regex. If the first element of the regex is greedy, it'll land at the end of the string. To match the next element, the engine will backtrack one character at a time to see if it can match.
    • Internal optimizations in the engine can avoid some of the caveats these rules imply, but they tend not to change the rules.
    I second the recommendation of Mastering Regular Expressions, though I hear Jeff's working on a new version at the moment.
Re (tilly) 1: Regexp evaluation
by tilly (Archbishop) on Oct 14, 2000 at 04:11 UTC
    It happens exactly as described in perlre. For the long version pick up Mastering Regular Expressions.

    The actual way it works is rather complex because of all of the optimizations, but the naive behaviour that it falls back on is pretty simple. It starts at the beginning of the string and the beginning of the RE. It proceeds through the string and the RE, every time it has to make a choice memorizing that spot. Eventually it probably gets into a dead end (the next character you are looking for is "k" and you saw "m", aw shucks) and then backs up to the last choice it had and goes with the next option it has not tried.

    Stop and think about it, it is proceeding left to right in the string and basically left to right in the RE (wildcards can result in looping around in the RE though) in the most obvious manner possible.

    Now you may hear that (.*) is greedy, while (.*?) is not. How does that work? Well it is simple. Remember that it has to make choices? Well with either construct it has a choice when it matches a ".". It can try to match another right away, or it can try to proceed. With (.*) it tries to match "." again, with (.*?) it will try to proceed through the RE first. So (.*) will wind up matching as many .'s as it can while still managing to match overall while (.*?) will match as few. (Greedy vs non-greedy.)

    Now sit down with perlre and see if you can figure out the idea behind how it is implemented. When you feel comfortable and visit Death to Dot Star! for some of the gotchas. :-)

RE: Regexp evaluation
by Adam (Vicar) on Oct 14, 2000 at 04:32 UTC
    Shame on you all! Shame!
    If you are going to tell some one to buy a book, by all means you should point them to the Perl Monks Gift Shoppe. Our new friend Malkavian would then be able to order the excellent Mastering Regular Expressions (J. Freidl, O'Reilly Press) and support the Monastery at the same time.

    And yes, Malkavian, you should get yourself a copy of that text. Not only will it help you with Perl regex's, but with the very concepts on which they are founded.

Re: Regexp evaluation
by japhy (Canon) on Oct 14, 2000 at 05:01 UTC
    MRE 2 is coming out in the spring. Hold out if you can.

    $_="goto+F.print+chop;\n=yhpaj";F1:eval
Re: Regexp evaluation
by extremely (Priest) on Oct 14, 2000 at 03:51 UTC

    Oh man, you might was well just go get Jeffrey Freidl's book now. At Oreilly.

    --
    $you = new YOU;
    honk() if $you->love(perl)