Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Say, a string has embedded palindromes (PD); a PD in this case has odd (at least 3) number of letters. I want a piece of code to be executed exactly once for every PD including nested ones. Let "piece of code" be 'say'. Regex to extract a PD is well-known, and of course TIMTOWTDI, the question is not how to better solve the task.

Almost accidentally I found a solution, which I'm puzzled about. I don't know if it could be a simpler SSCCE without recursion.

say 'match' if 'abABCDCBAdeXYZYXfg' =~ / (?>^.*?) ( ( (.)(?-3)\g-1 | (.).\g-1 ) (?{ say $2 }) ) /x;

Output:

YZY XYZYX CDC BCDCB ABCDCBA

Bingo, just what I need. Match fails -- it's irrelevant. But HOW? If either of the following is removed: (1) "^" anchor; (2) independence (s/>/:/), or (3) entire 1st pair of parens, then there are multiple visits to each PD i.e. not what I want. In fact, for (3) the PD groups are also visited in reversed order, this I don't understand neither. And why both PD groups (not only 1st one) are visited.

Being greedy (s/\*\?/*/) produces no output, this I understand (I hope).

If, in case of code above i.e. without removing anything, the engine gives characters back one by one (I thought this is how backtracking works) before it ultimately fails when it (suddenly, somehow) remembers about independence, then I'd expect YZY is found and said; and then XYZYX is found and embedded code executed for both YZY and XYZYX, i.e. both are said (i.e. YZY said twice in total).

P.S. I've seen "Embedded Code Execution Frequency" section in perlre.

Replies are listed 'Best First'.
Re: Possessive sub-pattern with non-greedy content + recursion: WHY does this work??
by choroba (Cardinal) on Aug 04, 2025 at 22:25 UTC
    BTW, to find all palindrome substrings efficiently, you can use the structure called Palindromic Tree or Eertree. See String::Eertree on CPAN and my blogpost about why I created it.

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

      Thanks, choroba, your module, blog post and linked paper look interesting. What follows isn't really on topic.

      I lamented here recently about Rosetta Perl snippets quality; here it is again. Sort of consolation for me (after dear ikegami was kind enough to enlighten me about near total lack of understanding about regexes), that there are people even sloppier. It's a bit surprising you bothered to benchmark against that code (give each and every advantage to your opponent, no?). To wit:

      (1) This sub uses package vars, the array grows an each call during benchmarking, kind of unfair position you put that contestant to, no? (2) It's a shock to see that all the innermost loop does is string reversing. (3) The condition in "if" is never true (4) the 2nd "for" gives the same result about half of the time because there's string overshoot beyond its length (5) I'd filter out duplicates as they appear instead of grepping later. Then:

      sub rosetta_code_new { my ( $str ) = @_; my ( @pal, %seen ); for my $n ( 0 .. length( $str ) - 1 ) { for my $m ( 1 .. length( $str ) - $n ) { my $strpal = substr $str, $n, $m; push @pal, $strpal if $strpal eq reverse $strpal and not $seen{ $strpal } ++ } } return @pal }

      And:

      Rate rosetta eertree eertree 1159/s -- -27% rosetta 1584/s 37% --

      Not exactly the result as published, methinks. To be fair, your solution outpaces the above naive one if e.g. "10" is increased to "20", but still, not by orders of magnitude:

      Rate rosetta eertree rosetta 387/s -- -31% eertree 565/s 46% --

      I won't publish/compare with "my" recursive regex solution, it's slower. Not "orders of magnitude" neither, but, yeah, slow.

        Interesting. You are right that the original Rosetta code could have been simplified and made more efficient. The Eertree algorithm still scales better, but you need to stretch the string a bit more. Fore example, for lengths 55 and 110:
        Rate rosetta_old rosetta_new eertree rosetta_old 145/s -- -92% -93% rosetta_new 1925/s 1226% -- -13% eertree 2214/s 1424% 15% -- Rate rosetta_old rosetta_new eertree rosetta_old 20.0/s -- -96% -98% rosetta_new 464/s 2218% -- -60% eertree 1166/s 5729% 151% --
        As you can see, the new Rosetta code is 4 times slower, but Eertree is only 2 times slower.

        Here's the code for those interested:

        map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

        is never true is always true

Re: Possessive sub-pattern with non-greedy content + recursion: WHY does this work??
by ikegami (Patriarch) on Aug 04, 2025 at 21:09 UTC

    Your question appears to be "why is the anchor needed?"

    Honestly, instead of spending time analysing your code, I'd rather show how to write this cleanly. Especially since your code doesn't think 1221 is a palindrome.

    First, let's define palindrome.

    An empty string? No. Too weird.
    A single character? No. Again, too weird.

    So a palindrome looks like this:

    • xx
    • xyx
    • axxa
    • axyxa
    • baxxab
    • baxyxab
    • ...

    Since the minimum length of a palindrome (as we defined it) is two, we see it's always going to be a pair of matching chars, with the following in between:

    • Nothing ("xx")
    • A single character ("xyx")
    • A palindrome ("a...a", "b...b", ...)

    So we get

    echo abABCDCBAdeXYZYXfg1221 | perl -Mv5.14 -nle' / ( (.) (?: (?-2) | .? ) \g-1 ) (?{ say $1 }) (*FAIL) /xs '

    Same, but using named subpatterns:

    echo abABCDCBAdeXYZYXfg1221 | perl -Mv5.14 -nle' / ( (?&PALINDROME) ) (?{ say $1 }) (*FAIL) (?(DEFINE) (?<PALINDROME> (.) (?: (?&PALINDROME) | .? ) \g-1 ) ) /xs '

    (*FAIL) is reached after finding and printing a palindrome. It forces a backtrack, which is to say it causes the engine to look for another palindrome.

Re: Possessive sub-pattern with non-greedy content + recursion: WHY does this work??
by ikegami (Patriarch) on Aug 04, 2025 at 18:52 UTC

    Gonna start with this:

    (?>^.*?)
    simplifies to
    ^

    .*? will always starting by attempting to match 0 characters. And the (?>...) prevents it from matching anything else.

    (All I have time for now.)

      simplifies to ^

      Oh... I didn't understand the whole construct, then. Thanks! I guess "Possessive sub-pattern with non-greedy..., etc." can be stricken out from title. To be replaced with, start-of-string anchor and failure? Because,

      Match fails -- it's irrelevant

      But looks like it is. And, in parallel, comparing with look-ahead, which at 1st glance can be a simpler alternative (?):

      say $-[1], ': ', $1 while 'olololo' =~ / (?= ( (.)(?-2)\g-1 | (.).\g-1 ) ) /gx; say '*'; say 'match' if 'olololo' =~ / ^ ( ( (.)(?-3)\g-1 | (.).\g-1 ) (?{ say $-[2], ': ', $2 }) ) (?#*F) /x;

      Output:

      0: olololo 1: lolol 2: ololo 3: lol 4: olo * 4: olo 3: lol 2: ololo 2: olo 1: lolol 0: olololo match

      Look-ahead misses some PD's for obvious reasons. But match success in case of recursion leads to omission of 2 PD's, too. Why? Converting "comment" to "fail" i.e. (F*) gives all nine PD's, visited exactly once each (with some disruption to "visit in reverse" order):

      4: olo 3: lol 2: ololo 2: olo 1: lolol 0: olololo 1: lol 0: ololo 0: olo

        omission of 2 omission of 3