in reply to Re: Performance of possessive quantifiers
in thread Performance of possessive quantifiers

Yes, that's right, and it explains why the posessive and atomic1 have the same performance impact.

(The parse tree that perl5.10.0 -cMre=debug -we 'm/(?:be|ea|a)++\d/' and perl5.10.0 -cMre=debug -we 'm/(?>(?:be|ea|a)+)\d/' are indeed identical).

But it doesn't explain the performance compared to the non-possessive non-atomic form.

Replies are listed 'Best First'.
Re^3: Performance of possessive quantifiers
by hipowls (Curate) on Jan 27, 2008 at 22:55 UTC

    Good question, I suspect that it is an optimization. Trying

    my $str = "bea" x 100; my $re = qr/(?:be|ea|a)/; sub atomic { use re 'debug'; say 'Matched $re+\d' if $str =~ m/$re+\d/; } sub possessive { use re 'debug'; say 'Matched $re++\d' if $str =~ m/$re++\d/; } atomic(); possessive();
    The regexes compile to
    # atomic Compiling REx "(?-xism:(?:be|ea|a))+\d" Final program: 1: CURLYX[0] {1,32767} (14) 3: TRIE-EXACT[abe] (13) <be> <ea> <a> 13: WHILEM[1/1] (0) 14: NOTHING (15) 15: DIGIT (16) 16: END (0) minlen 2 #possessive Compiling REx "(?-xism:(?:be|ea|a))++\d" Final program: 1: SUSPEND (19) 3: CURLYX[0] {1,32767} (16) 5: TRIE-EXACT[abe] (15) <be> <ea> <a> 15: WHILEM[1/1] (0) 16: NOTHING (17) 17: SUCCEED (0) 18: TAIL (19) 19: DIGIT (20) 20: END (0) minlen 2 # regex.ato and regex.pos contain the output including # the Compiling REx message michael@lnx-main:working> wc regex.ato regex.pos 10650 70471 1837158 regex.ato 229368 1989603 47988305 regex.pos
    As you can see it is doing a lot more. I don't know why but it will be interesting to find out;)

    Update 1: It may be something to do with caching. Looking in the re debug output I find

    michael@lnx-main:working> grep Detected regex.ato whilem: Detected a super-linear match, switching on caching... michael@lnx-main:working> grep Detected regex.pos whilem: Detected a super-linear match, switching on caching... # Now look for this line # whilem: (cache) already tried at this position... michael@lnx-main:working> grep '(cache)' regex.ato | wc 298 2086 26426 michael@lnx-main:working> grep '(cache)' regex.pos | wc 0 0 0
    At this point I am prepared to confess that I will struggling a bit to dig in deeper;-)

    Update 2: The SUSPEND/TAIL pair look like where any interested monk should start. In perldebguts I found this

    # Do nothing NOTHING no Match empty string. # A variant of above which delimits a group, thus stops optimizations TAIL no Match empty string. Can jump here from outside.