in reply to Re^2: Possessive sub-pattern with non-greedy content + recursion: WHY does this work??
in thread Possessive sub-pattern with non-greedy content + recursion: WHY does this work??

Interesting. You are right that the original Rosetta code could have been simplified and made more efficient. The Eertree algorithm still scales better, but you need to stretch the string a bit more. Fore example, for lengths 55 and 110:
Rate rosetta_old rosetta_new eertree rosetta_old 145/s -- -92% -93% rosetta_new 1925/s 1226% -- -13% eertree 2214/s 1424% 15% -- Rate rosetta_old rosetta_new eertree rosetta_old 20.0/s -- -96% -98% rosetta_new 464/s 2218% -- -60% eertree 1166/s 5729% 151% --
As you can see, the new Rosetta code is 4 times slower, but Eertree is only 2 times slower.

Here's the code for those interested:

#!/usr/bin/perl use warnings; use strict; use feature qw{ say }; use experimental qw( signatures ); use String::Eertree; sub rosetta_old($str) { my @pal; for my $n (1 .. length $str) { for my $m (1 .. length $str) { my $strrev = ""; my $strpal = substr $str, $n - 1, $m; if ($strpal ne "") { for my $p (reverse 1 .. length $strpal) { $strrev .= substr $strpal, $p - 1, 1; } ($strpal eq $strrev) and push @pal, $strpal; } } } my %seen; return grep ! $seen{$_}++, @pal } sub rosetta($str) { my (@pal, %seen); for my $n (0 .. length($str) - 1) { for my $m (1 .. length $str) { my $strpal = substr $str, $n, $m; push @pal, $strpal if $strpal eq reverse($strpal) && ! $seen{$strpal}++; } } return @pal } my $s = join "", map chr(ord('a') + int rand 26), 1 .. 55 * 2; say $s; my @p1 = 'String::Eertree'->new(string => "$s")->uniq_palindromes; my @p2 = rosetta_old($s); my @p3 = rosetta($s); use Test2::V0 -no_srand => 1; say "Eertree: @p1"; say "Rosetta old: @p2"; say "Rosetta new: @p3"; like \@p2, bag { item $_ for @p1; end() }; like \@p3, bag { item $_ for @p1; end() }; use Benchmark qw{ cmpthese }; cmpthese(-3, { eertree => sub { 'String::Eertree'->new(string => $s)->uniq_palind +romes }, rosetta_old => sub { rosetta_old($s) }, rosetta_new => sub { rosetta($s) }, }); done_testing();

map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
  • Comment on Re^3: Possessive sub-pattern with non-greedy content + recursion: WHY does this work??
  • Select or Download Code

Replies are listed 'Best First'.
Re^4: Possessive sub-pattern with non-greedy content + recursion: WHY does this work??
by Anonymous Monk on Aug 06, 2025 at 22:13 UTC

    Scaling better outcome isn't disputed, immediate results very much are. You missed - $n (see "overshoot" above). Perhaps you are tired I'm just irritating the hell out of people with non-issues, sorry

      Oh thank you. The difference moved to 110 versus 220:
      Rate rosetta_old rosetta_new eertree rosetta_old 20.0/s -- -98% -98% rosetta_new 864/s 4222% -- -27% eertree 1177/s 5784% 36% -- Rate rosetta_old rosetta_new eertree rosetta_old 2.63/s -- -99% -100% rosetta_new 215/s 8058% -- -67% eertree 654/s 24749% 205% -- 1..2

      I missed the - $n because I was trying to get all the palindromes from the Rosetta code instead of just the unique ones. It seemed the new code worked correctly, but the old one didn't, so I started experimenting with replicating its incorrect output...

      map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

        Heh, just a week later and I kicked myself hard once and again to finally read the PDF paper in earnest, and, of other sources, only a less cruel text linked in a footnote in said PDF. The blog entry, linked from PWC #145 task #2, didn't open for me, which I didn't pursue. I.e. I didn't consult anyone's code before writing this implementation (speed, clarity, interface have, of course, room for improvement):

        package Eertree; use strict; use warnings; use Exporter 'import'; our @EXPORT_OK = qw/ ee /; sub ee { my $str = shift; my $FULL = shift; # if true: return "nodes" arrayref "as is", th +en DIY: # extract and expand to palindrome complete +list # (it's trivial, isn't it?), work with each # palindrome's position, do statistics, etc. # if false: just return unique palindromes as +a list # (note: it's faster because there's less in +fo # collected) my @nodes = ( { len => -1, lnk => 0 }, { len => 0, lnk => 0 } ); my $lpspi = 0; # "longest proper suffix-palindrome index" # (of substring processed so far) for my $i ( 0 .. length( $str ) - 1 ) { my $c = substr $str, $i, 1; my $current = $lpspi; my $n = $nodes[ $current ]; while ( $c ne substr $str, $i - $n-> { len } - 1, 1 ) { $current = $n-> { lnk }; $n = $nodes[ $current ] } my $suffix; # trailing palindrome index # after adding current character if ( $current == 0 ) { # lone character "palindrome" # (note: this and following branches have some code in com +mon i.e. # aren't dry, I didn't test whether consolidation would im +pair # speed, but I think a couple of vague conditionals, inste +ad, # could impair clarity if ( exists $n-> { edges }{ $c }) { # already seen? $suffix = $n-> { edges }{ $c }; push @{ $nodes[ $suffix ]{ offsets }}, $i if $FULL } else { my $p = { len => 1, lnk => 1, offsets => [ $i ]}; push @nodes, $p; $n-> { edges }{ $c } = $suffix = $#nodes } } else { if ( exists $n-> { edges }{ $c }) { # already seen? $suffix = $n-> { edges }{ $c }; _descend( $suffix, $i, \@nodes ) if $FULL } else { my $L = 2 + $n-> { len }; my $p = { len => $L, offsets => [ $i - $L + 1 ], }; push @nodes, $p; $n-> { edges }{ $c } = $suffix = $#nodes; do { $current = $n-> { lnk }; $n = $nodes[ $current ]; } while ( $c ne substr $str, $i - $n-> { len } - 1, 1 +); $p-> { lnk } = $n-> { edges }{ $c }; _descend( $p-> { lnk }, $i, \@nodes ) if $FULL } } # make sure it's "proper" $lpspi = ( $i + 1 ) == $nodes[ $suffix ]{ len } ? $nodes[ $suffix ]{ lnk } : $suffix; } return \@nodes if $FULL; splice @nodes, 0, 2; return map substr( $str, $_-> { offsets }[ 0 ], $_-> { len }), @no +des } sub _descend { my ( $x, $i, $nodes ) = @_; while ( $x > 1 ) { my $n_ = $nodes-> [ $x ]; push @{ $n_-> { offsets }}, $i - $n_-> { len } + 1; $x = $n_-> { lnk } } } 1;

        Then I checked PWC #146 i.e. next one, for recap; there were _very_ few blog links (it was New Year's vacation time, unfortunately), and only _one_ says its Perl is Python's port of _real_ eertree. This one I ran and found just too slow for short inputs; then it gave "deep recursion" (what???) warnings for still too short but slightly longer input. I terminated the process it took too long. Then I checked only 2 more random GH directory PWC #145 entries; didn't find eertree there neither. Found a funny excuse _not_ to do an eertree :-). I'd be _really_ grateful if someone points me to a GH PWC #145 Perl's eertree code.

        Which leaves String::Eertree only, I think. With a harness as in (grand)*parent node:

        my $s = join "", map chr(ord('a') + int rand 26), 1 .. 200; cmpthese(-3, { eertree => sub { 'String::Eertree'->new(string => $s)->uniq_palind +romes }, rosetta_new => sub { rosetta_fixed($s) }, ee => sub { ee($s) }, }); Rate rosetta_new eertree ee rosetta_new 271/s -- -58% -93% eertree 651/s 140% -- -84% ee 3964/s 1362% 509% --

        Which, I guess, is due to "Perl's subs/methods calls are too expensive" (_lots_ of calls in S::E internally) and/or "Moo"? or something else? But wait, comparison above is unfair ("give all advantage etc.") because 'ee' skips some work if only unique palindromes are required. Let's rather do this:

        cmpthese(-3, { eertree => sub { 'String::Eertree'->new(string => $s) }, ee => sub { ee($s, 1) }, }); Rate eertree ee eertree 648/s -- -80% ee 3199/s 394% --

        I.e. contestants consume their strings, then build internal structures as required, without producing unique or non-unique lists (note: 'ee' also provides offsets for each (non-unique) pali which S::E doesn't. Not too much work, but still...) Yeah, that's more fair.

        In fact random chars, very short (200 _is_ short) strings don't look interesting. I tried what follows with (what looked at the time as) more entertaining input -- a string up to 400+k chars with "real" pali randomly interspersed within. Maybe it's as uninteresting as previous. The purpose was to also check the linearity i.e. O(n)-ness which, as I squinted in a skewed way (guess in which favour) didn't look right to me at first. False alarm, both appear O(n) ("X" - thousands of repetitions of known pali (randomly placed) in the input; Y -- seconds; A -- ee, B -- S::E). But perhaps an input can be constructed which results in catastrophic slowdown in A and/or B. I have no proof i.e. can't write scientific PDF papers.

        + 2.5 +-------------------------------------------------+ | + + + + + + + + + | | | | | | B | | | | B | 2 |-+ +-| | B | | B | | | | B | | | | B | 1.5 |-+ B +-| | | | B | | | | B | | | | B | | B | 1 |-+ +-| | B | | | | B | | B A | | A A | | B A A | 0.5 |-+ A A +-| | B A A | | B A | | A A | | B A A | | B A A | | A A + + + + + + + + | 0 +-------------------------------------------------+ 0 2 4 6 8 10 12 14 16 18 20 use strict; use warnings; use feature 'say'; use Data::Dump 'dd'; use Time::HiRes 'time'; use String::Eertree; use lib '.'; use Eertree 'ee'; use Benchmark qw/ timeit :hireswallclock /; use Chart::Gnuplot; use File::Spec::Functions 'catfile'; use File::Basename 'dirname'; my $GP = catfile dirname( $^X ), '../../c/bin/gnuplot.exe'; die unless -x $GP; # assume Strawberry Perl use constant REPEATS => 5; STDOUT-> autoflush( 1 ); sub get_sample { my $n = shift; my $str = 'amanaplanacanalpanama'; my $s = ''; for ( 1 .. $n * 1000 ) { substr $s, rand( length $s ), 0, $str } return $s } my ( @xdata, @ydata1, @ydata2, @datasets ); for my $n ( 1 .. 20) { print "$n "; my $s = get_sample( $n ); push @xdata, $n; { my $t = timeit REPEATS, sub { ee( $s, 1 )}; my $y = $t-> [ 1 ] / $t-> [ -1 ]; push @ydata1, $y; } { my $t = timeit REPEATS, sub { String::Eertree-> new( string => + $s )}; my $y = $t-> [ 1 ] / $t-> [ -1 ]; push @ydata2, $y; } } print "\n\n"; push @datasets, Chart::Gnuplot::DataSet-> new( xdata => \@xdata, ydata => \@ydata1, style => 'points', ), Chart::Gnuplot::DataSet-> new( xdata => \@xdata, ydata => \@ydata2, style => 'points', ); my $chart = Chart::Gnuplot-> new( gnuplot => $GP, terminal => 'dumb size 60, 40', ); $chart-> plot2d( @datasets );

        Actually, thank you very much, @choroba, for introducing me to the subject. Very entertaining.