Possessive sub-pattern with non-greedy content + recursion: WHY does this work??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Say, a string has embedded palindromes (PD); a PD in this case has odd (at least 3) number of letters. I want a piece of code to be executed exactly once for every PD including nested ones. Let "piece of code" be 'say'. Regex to extract a PD is well-known, and of course TIMTOWTDI, the question is not how to better solve the task.

Almost accidentally I found a solution, which I'm puzzled about. I don't know if it could be a simpler SSCCE without recursion.

say 'match' if 'abABCDCBAdeXYZYXfg' =~ /
    (?>^.*?)
    ( 
        (
            (.)(?-3)\g-1 | (.).\g-1 
        )
        (?{ 
            say $2
        })
    )
/x;
[download]

Output:

YZY
XYZYX
CDC
BCDCB
ABCDCBA
[download]

Bingo, just what I need. Match fails -- it's irrelevant. But HOW? If either of the following is removed: (1) "^" anchor; (2) independence (s/>/:/), or (3) entire 1st pair of parens, then there are multiple visits to each PD i.e. not what I want. In fact, for (3) the PD groups are also visited in reversed order, this I don't understand neither. And why both PD groups (not only 1st one) are visited.

Being greedy (s/\*\?/*/) produces no output, this I understand (I hope).

If, in case of code above i.e. without removing anything, the engine gives characters back one by one (I thought this is how backtracking works) before it ultimately fails when it (suddenly, somehow) remembers about independence, then I'd expect YZY is found and said; and then XYZYX is found and embedded code executed for both YZY and XYZYX, i.e. both are said (i.e. YZY said twice in total).

P.S. I've seen "Embedded Code Execution Frequency" section in perlre.

Comment on Possessive sub-pattern with non-greedy content + recursion: WHY does this work?? Select or Download Code

Replies are listed 'Best First'.
Re: Possessive sub-pattern with non-greedy content + recursion: WHY does this work?? by choroba (Cardinal) on Aug 04, 2025 at 22:25 UTC
BTW, to find all palindrome substrings efficiently, you can use the structure called Palindromic Tree or Eertree. See String::Eertree on CPAN and my blogpost about why I created it. `map{substr$_->[0],$_->[1]\|\|0,1}[\\|\|{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^ARGV,3]`	[reply] [d/l]
Re^2: Possessive sub-pattern with non-greedy content + recursion: WHY does this work?? by Anonymous Monk on Aug 05, 2025 at 23:41 UTC
Thanks, choroba, your module, blog post and linked paper look interesting. What follows isn't really on topic. I lamented here recently about Rosetta Perl snippets quality; here it is again. Sort of consolation for me (after dear ikegami was kind enough to enlighten me about near total lack of understanding about regexes), that there are people even sloppier. It's a bit surprising you bothered to benchmark against that code (give each and every advantage to your opponent, no?). To wit: (1) This sub uses package vars, the array grows an each call during benchmarking, kind of unfair position you put that contestant to, no? (2) It's a shock to see that all the innermost loop does is string reversing. (3) The condition in "if" is never true (4) the 2nd "for" gives the same result about half of the time because there's string overshoot beyond its length (5) I'd filter out duplicates as they appear instead of grepping later. Then: `sub rosetta_code_new { my ( $str ) = @_; my ( @pal, %seen ); for my $n ( 0 .. length( $str ) - 1 ) { for my $m ( 1 .. length( $str ) - $n ) { my $strpal = substr $str, $n, $m; push @pal, $strpal if $strpal eq reverse $strpal and not $seen{ $strpal } ++ } } return @pal }` [download] And: `Rate rosetta eertree eertree 1159/s -- -27% rosetta 1584/s 37% --` [download] Not exactly the result as published, methinks. To be fair, your solution outpaces the above naive one if e.g. "10" is increased to "20", but still, not by orders of magnitude: `Rate rosetta eertree rosetta 387/s -- -31% eertree 565/s 46% --` [download] I won't publish/compare with "my" recursive regex solution, it's slower. Not "orders of magnitude" neither, but, yeah, slow.	[reply] [d/l] [select]
Re^3: Possessive sub-pattern with non-greedy content + recursion: WHY does this work?? by choroba (Cardinal) on Aug 06, 2025 at 19:41 UTC
Interesting. You are right that the original Rosetta code could have been simplified and made more efficient. The Eertree algorithm still scales better, but you need to stretch the string a bit more. Fore example, for lengths 55 and 110: `Rate rosetta_old rosetta_new eertree rosetta_old 145/s -- -92% -93% rosetta_new 1925/s 1226% -- -13% eertree 2214/s 1424% 15% -- Rate rosetta_old rosetta_new eertree rosetta_old 20.0/s -- -96% -98% rosetta_new 464/s 2218% -- -60% eertree 1166/s 5729% 151% --` [download] As you can see, the new Rosetta code is 4 times slower, but Eertree is only 2 times slower. Here's the code for those interested: Read more... (2 kB) `map{substr$_->[0],$_->[1]\|\|0,1}[\\|\|{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^ARGV,3]`	[reply] [d/l] [select]
Re^4: Possessive sub-pattern with non-greedy content + recursion: WHY does this work?? by Anonymous Monk on Aug 06, 2025 at 22:13 UTC
Re^5: Possessive sub-pattern with non-greedy content + recursion: WHY does this work?? by choroba (Cardinal) on Aug 06, 2025 at 23:13 UTC
Some notes below your chosen depth have not been shown here
Re^3: Possessive sub-pattern with non-greedy content + recursion: WHY does this work?? by Anonymous Monk on Aug 05, 2025 at 23:53 UTC
~~is never true~~ is always true	[reply]
Re: Possessive sub-pattern with non-greedy content + recursion: WHY does this work?? by ikegami (Patriarch) on Aug 04, 2025 at 21:09 UTC
Your question appears to be "why is the anchor needed?" Honestly, instead of spending time analysing your code, I'd rather show how to write this cleanly. Especially since your code doesn't think 1221 is a palindrome. First, let's define palindrome. An empty string? No. Too weird. A single character? No. Again, too weird. So a palindrome looks like this: xx xyx axxa axyxa baxxab baxyxab ... Since the minimum length of a palindrome (as we defined it) is two, we see it's always going to be a pair of matching chars, with the following in between: Nothing ("xx") A single character ("xyx") A palindrome ("a...a", "b...b", ...) So we get `echo abABCDCBAdeXYZYXfg1221 \| perl -Mv5.14 -nle' / ( (.) (?: (?-2) \| .? ) \g-1 ) (?{ say $1 }) (FAIL) /xs '` [download] Same, but using named subpatterns: `echo abABCDCBAdeXYZYXfg1221 \| perl -Mv5.14 -nle' / ( (?&PALINDROME) ) (?{ say $1 }) (FAIL) (?(DEFINE) (?<PALINDROME> (.) (?: (?&PALINDROME) \| .? ) \g-1 ) ) /xs '` [download] `(*FAIL)` is reached after finding and printing a palindrome. It forces a backtrack, which is to say it causes the engine to look for another palindrome.	[reply] [d/l] [select]
Re: Possessive sub-pattern with non-greedy content + recursion: WHY does this work?? by ikegami (Patriarch) on Aug 04, 2025 at 18:52 UTC
Gonna start with this: `(?>^.?)` [download] simplifies to `^` [download] `.?` will always starting by attempting to match 0 characters. And the `(?>...)` prevents it from matching anything else. (All I have time for now.)	[reply] [d/l] [select]
Re^2: Possessive sub-pattern with non-greedy content + recursion: WHY does this work?? by Anonymous Monk on Aug 04, 2025 at 19:39 UTC
simplifies to ^ Oh... I didn't understand the whole construct, then. Thanks! I guess "Possessive sub-pattern with non-greedy..., etc." can be stricken out from title. To be replaced with, start-of-string anchor and failure? Because, Match fails -- it's irrelevant But looks like it is. And, in parallel, comparing with look-ahead, which at 1st glance can be a simpler alternative (?): `say $-[1], ': ', $1 while 'olololo' =~ / (?= ( (.)(?-2)\g-1 \| (.).\g-1 ) ) /gx; say ''; say 'match' if 'olololo' =~ / ^ ( ( (.)(?-3)\g-1 \| (.).\g-1 ) (?{ say $-[2], ': ', $2 }) ) (?#F) /x;` [download] Output: `0: olololo 1: lolol 2: ololo 3: lol 4: olo * 4: olo 3: lol 2: ololo 2: olo 1: lolol 0: olololo match` [download] Look-ahead misses some PD's for obvious reasons. But match success in case of recursion leads to omission of 2 PD's, too. Why? Converting "comment" to "fail" i.e. `(F*)` gives all nine PD's, visited exactly once each (with some disruption to "visit in reverse" order): `4: olo 3: lol 2: ololo 2: olo 1: lolol 0: olololo 1: lol 0: ololo 0: olo` [download]	[reply] [d/l] [select]
Re^3: Possessive sub-pattern with non-greedy content + recursion: WHY does this work?? by Anonymous Monk on Aug 04, 2025 at 19:42 UTC
~~omission of 2~~ omission of 3	[reply]