in reply to Re^5: Possessive sub-pattern with non-greedy content + recursion: WHY does this work??
in thread Possessive sub-pattern with non-greedy content + recursion: WHY does this work??
Heh, just a week later and I kicked myself hard once and again to finally read the PDF paper in earnest, and, of other sources, only a less cruel text linked in a footnote in said PDF. The blog entry, linked from PWC #145 task #2, didn't open for me, which I didn't pursue. I.e. I didn't consult anyone's code before writing this implementation (speed, clarity, interface have, of course, room for improvement):
package Eertree; use strict; use warnings; use Exporter 'import'; our @EXPORT_OK = qw/ ee /; sub ee { my $str = shift; my $FULL = shift; # if true: return "nodes" arrayref "as is", th +en DIY: # extract and expand to palindrome complete +list # (it's trivial, isn't it?), work with each # palindrome's position, do statistics, etc. # if false: just return unique palindromes as +a list # (note: it's faster because there's less in +fo # collected) my @nodes = ( { len => -1, lnk => 0 }, { len => 0, lnk => 0 } ); my $lpspi = 0; # "longest proper suffix-palindrome index" # (of substring processed so far) for my $i ( 0 .. length( $str ) - 1 ) { my $c = substr $str, $i, 1; my $current = $lpspi; my $n = $nodes[ $current ]; while ( $c ne substr $str, $i - $n-> { len } - 1, 1 ) { $current = $n-> { lnk }; $n = $nodes[ $current ] } my $suffix; # trailing palindrome index # after adding current character if ( $current == 0 ) { # lone character "palindrome" # (note: this and following branches have some code in com +mon i.e. # aren't dry, I didn't test whether consolidation would im +pair # speed, but I think a couple of vague conditionals, inste +ad, # could impair clarity if ( exists $n-> { edges }{ $c }) { # already seen? $suffix = $n-> { edges }{ $c }; push @{ $nodes[ $suffix ]{ offsets }}, $i if $FULL } else { my $p = { len => 1, lnk => 1, offsets => [ $i ]}; push @nodes, $p; $n-> { edges }{ $c } = $suffix = $#nodes } } else { if ( exists $n-> { edges }{ $c }) { # already seen? $suffix = $n-> { edges }{ $c }; _descend( $suffix, $i, \@nodes ) if $FULL } else { my $L = 2 + $n-> { len }; my $p = { len => $L, offsets => [ $i - $L + 1 ], }; push @nodes, $p; $n-> { edges }{ $c } = $suffix = $#nodes; do { $current = $n-> { lnk }; $n = $nodes[ $current ]; } while ( $c ne substr $str, $i - $n-> { len } - 1, 1 +); $p-> { lnk } = $n-> { edges }{ $c }; _descend( $p-> { lnk }, $i, \@nodes ) if $FULL } } # make sure it's "proper" $lpspi = ( $i + 1 ) == $nodes[ $suffix ]{ len } ? $nodes[ $suffix ]{ lnk } : $suffix; } return \@nodes if $FULL; splice @nodes, 0, 2; return map substr( $str, $_-> { offsets }[ 0 ], $_-> { len }), @no +des } sub _descend { my ( $x, $i, $nodes ) = @_; while ( $x > 1 ) { my $n_ = $nodes-> [ $x ]; push @{ $n_-> { offsets }}, $i - $n_-> { len } + 1; $x = $n_-> { lnk } } } 1;
Then I checked PWC #146 i.e. next one, for recap; there were _very_ few blog links (it was New Year's vacation time, unfortunately), and only _one_ says its Perl is Python's port of _real_ eertree. This one I ran and found just too slow for short inputs; then it gave "deep recursion" (what???) warnings for still too short but slightly longer input. I terminated the process it took too long. Then I checked only 2 more random GH directory PWC #145 entries; didn't find eertree there neither. Found a funny excuse _not_ to do an eertree :-). I'd be _really_ grateful if someone points me to a GH PWC #145 Perl's eertree code.
Which leaves String::Eertree only, I think. With a harness as in (grand)*parent node:
my $s = join "", map chr(ord('a') + int rand 26), 1 .. 200; cmpthese(-3, { eertree => sub { 'String::Eertree'->new(string => $s)->uniq_palind +romes }, rosetta_new => sub { rosetta_fixed($s) }, ee => sub { ee($s) }, }); Rate rosetta_new eertree ee rosetta_new 271/s -- -58% -93% eertree 651/s 140% -- -84% ee 3964/s 1362% 509% --
Which, I guess, is due to "Perl's subs/methods calls are too expensive" (_lots_ of calls in S::E internally) and/or "Moo"? or something else? But wait, comparison above is unfair ("give all advantage etc.") because 'ee' skips some work if only unique palindromes are required. Let's rather do this:
cmpthese(-3, { eertree => sub { 'String::Eertree'->new(string => $s) }, ee => sub { ee($s, 1) }, }); Rate eertree ee eertree 648/s -- -80% ee 3199/s 394% --
I.e. contestants consume their strings, then build internal structures as required, without producing unique or non-unique lists (note: 'ee' also provides offsets for each (non-unique) pali which S::E doesn't. Not too much work, but still...) Yeah, that's more fair.
In fact random chars, very short (200 _is_ short) strings don't look interesting. I tried what follows with (what looked at the time as) more entertaining input -- a string up to 400+k chars with "real" pali randomly interspersed within. Maybe it's as uninteresting as previous. The purpose was to also check the linearity i.e. O(n)-ness which, as I squinted in a skewed way (guess in which favour) didn't look right to me at first. False alarm, both appear O(n) ("X" - thousands of repetitions of known pali (randomly placed) in the input; Y -- seconds; A -- ee, B -- S::E). But perhaps an input can be constructed which results in catastrophic slowdown in A and/or B. I have no proof i.e. can't write scientific PDF papers.
+ 2.5 +-------------------------------------------------+ | + + + + + + + + + | | | | | | B | | | | B | 2 |-+ +-| | B | | B | | | | B | | | | B | 1.5 |-+ B +-| | | | B | | | | B | | | | B | | B | 1 |-+ +-| | B | | | | B | | B A | | A A | | B A A | 0.5 |-+ A A +-| | B A A | | B A | | A A | | B A A | | B A A | | A A + + + + + + + + | 0 +-------------------------------------------------+ 0 2 4 6 8 10 12 14 16 18 20 use strict; use warnings; use feature 'say'; use Data::Dump 'dd'; use Time::HiRes 'time'; use String::Eertree; use lib '.'; use Eertree 'ee'; use Benchmark qw/ timeit :hireswallclock /; use Chart::Gnuplot; use File::Spec::Functions 'catfile'; use File::Basename 'dirname'; my $GP = catfile dirname( $^X ), '../../c/bin/gnuplot.exe'; die unless -x $GP; # assume Strawberry Perl use constant REPEATS => 5; STDOUT-> autoflush( 1 ); sub get_sample { my $n = shift; my $str = 'amanaplanacanalpanama'; my $s = ''; for ( 1 .. $n * 1000 ) { substr $s, rand( length $s ), 0, $str } return $s } my ( @xdata, @ydata1, @ydata2, @datasets ); for my $n ( 1 .. 20) { print "$n "; my $s = get_sample( $n ); push @xdata, $n; { my $t = timeit REPEATS, sub { ee( $s, 1 )}; my $y = $t-> [ 1 ] / $t-> [ -1 ]; push @ydata1, $y; } { my $t = timeit REPEATS, sub { String::Eertree-> new( string => + $s )}; my $y = $t-> [ 1 ] / $t-> [ -1 ]; push @ydata2, $y; } } print "\n\n"; push @datasets, Chart::Gnuplot::DataSet-> new( xdata => \@xdata, ydata => \@ydata1, style => 'points', ), Chart::Gnuplot::DataSet-> new( xdata => \@xdata, ydata => \@ydata2, style => 'points', ); my $chart = Chart::Gnuplot-> new( gnuplot => $GP, terminal => 'dumb size 60, 40', ); $chart-> plot2d( @datasets );
Actually, thank you very much, @choroba, for introducing me to the subject. Very entertaining.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^7: Possessive sub-pattern with non-greedy content + recursion: WHY does this work??
by Anonymous Monk on Aug 14, 2025 at 21:27 UTC |