petdance has asked for the wisdom of the Perl Monks concerning the following question:

I understand how possessive quantifiers work, but I'm stuck thinking of a real example of when one would want to use them. perlre and perlreref have none.

Have you used possessive quantifiers in your code? Why? Please show a sample regex and explain why it needed them.

Edit: Besides quoted strings as mentioned below, do you have any real-world examples of needing to use a possessive quantifier?

xoxo,
Andy

  • Comment on When would I want to use possessive quantifiers?

Replies are listed 'Best First'.
Re: When would I want to use possessive quantifiers?
by ikegami (Patriarch) on Jul 07, 2010 at 17:34 UTC

    Without knowing the internals of the regex engine,

    'aaaaaaaaaaaaaaa' =~ /^a+b/

    would try

    1. 'aaaaaaaaaaaaaaa' followed by 'b'?
    2. 'aaaaaaaaaaaaaa' followed by 'b'?
    3. 'aaaaaaaaaaaaa' followed by 'b'?
    4. 'aaaaaaaaaaaa' followed by 'b'?
    5. ...
    6. 'a' followed by 'b'?
    7. FAIL

    You could prevent that by using

    'aaaaaaaaaaaaaaa' =~ /^a++b/
    1. 'aaaaaaaaaaaaaaa' followed by 'b'?
    2. FAIL

    That said, /^a+b/ is already highly optimised. (The regex engine searches for "ab" and fails if it doesn't exist.) But replace "a" and "b" with more complex patterns and it's up to you to provide the optimisation using the possessive modifier.

    Compare

    $ perl -Mre=debug -e'"aaaaaaaaaaaaaaa" =~ /^a+[bB]/' ... Matching REx "^a+[bB]" against "aaaaaaaaaaaaaaa" 0 <> <aaaaaaaaaa> | 1:BOL(2) 0 <> <aaaaaaaaaa> | 2:PLUS(5) EXACT <a> can match 15 times out of +2147483647... 15 <aaaaaaaaaaaa> <> | 5: ANYOF[Bb][](16) failed... 14 <aaaaaaaaaaa> <a> | 5: ANYOF[Bb][](16) failed... 13 <aaaaaaaaaa> <aa> | 5: ANYOF[Bb][](16) failed... 12 <aaaaaaaaa> <aaa> | 5: ANYOF[Bb][](16) failed... 11 <aaaaaaaa> <aaaa> | 5: ANYOF[Bb][](16) failed... 10 <aaaaaaa> <aaaaa> | 5: ANYOF[Bb][](16) failed... 9 <aaaaaa> <aaaaaa> | 5: ANYOF[Bb][](16) failed... 8 <aaaaa> <aaaaaaa> | 5: ANYOF[Bb][](16) failed... 7 <aaaaa> <aaaaaaaa> | 5: ANYOF[Bb][](16) failed... 6 <aaaaa> <aaaaaaaaa> | 5: ANYOF[Bb][](16) failed... 5 <aaaaa> <aaaaaaaaaa> | 5: ANYOF[Bb][](16) failed... 4 <aaaa> <aaaaaaaaaa> | 5: ANYOF[Bb][](16) failed... 3 <aaa> <aaaaaaaaaa> | 5: ANYOF[Bb][](16) failed... 2 <aa> <aaaaaaaaaa> | 5: ANYOF[Bb][](16) failed... 1 <a> <aaaaaaaaaa> | 5: ANYOF[Bb][](16) failed... failed... Match failed ... $ perl -Mre=debug -e'"aaaaaaaaaaaaaaa" =~ /^a++[bB]/' ... Matching REx "^a++[bB]" against "aaaaaaaaaaaaaaa" 0 <> <aaaaaaaaaa> | 1:BOL(2) 0 <> <aaaaaaaaaa> | 2:SUSPEND(9) 0 <> <aaaaaaaaaa> | 4: PLUS(7) EXACT <a> can match 15 times out o +f 2147483647... 15 <aaaaaaaaaaaa> <> | 7: SUCCEED(0) subpattern success... 15 <aaaaaaaaaaaa> <> | 9:ANYOF[Bb][](20) failed... Match failed ...
Re: When would I want to use possessive quantifiers?
by Limbic~Region (Chancellor) on Jul 07, 2010 at 17:17 UTC
    petdance,
    I freely admit I have never used possessive quantifiers nor can I think of an example where I might want to. I was skeptical by your statement that perlre didn't have an example so I looked: For instance, the typical "match a double-quoted string" problem can be most efficiently performed when written as: /"(?:[^"\\]++|\\.)*+"/...

    It goes on to say that it is just syntactic sugar and could be re-written as /"(?>(?:(?>[^"\\]+)|\\.)*)"/

    Cheers - L~R

      It goes on to say that it is just syntactic sugar
      They are indeed syntactic sugar. But I prefer to write: /a++/ over /(?>a+)/. 2 special characters instead of 5.

      As for the OPs question, the main use is performance - specially in cases of not matching. Using possessive quantifiers changes the meaning - unlike non-greedy matches, possessive quantifiers can change the matching/non-matching behaviour. That is, given a pattern, making quantifiers non-greedy (or making non-greedy quantifiers greedy) will not change the set of strings the pattern matches, but making quantifiers possessive can change the set (but only by reducing the set). But I would recommend using possessive quantifiers for that effect - that's just too subtle.

Re: When would I want to use possessive quantifiers?
by AnomalousMonk (Archbishop) on Jul 07, 2010 at 21:07 UTC
Re: When would I want to use possessive quantifiers?
by moritz (Cardinal) on Jul 08, 2010 at 06:29 UTC
    You always use possessive quantifiers and/or other backtracking control when you're convinced that what you've parsed right now is right.

    Suppose you want to parse something like Perl code, and you see

    from /foo/

    First you'd match an identifier, say with \w+, and then a regex. However if that parsing fails (for example because there's another line, and no semicolon), then instead of giving up, \w+ starts matching fro, and the regex parsing code matches m /foo/ (which is a valid regex too).

    This is both slower than failing outright, and can lead to misparses in some situations. So you'd really want to match an identifier as \w++ instead of just \w+

    The standard Perl 6 grammar uses Perl 6 regexes to parse Perl 6. Every time you see a token or a rule, it's a non-backtracking regex.

    There backtracking control is crucial for getting good parse error messages, and for speed.