in reply to Re^4: Possessive sub-pattern with non-greedy content + recursion: WHY does this work??
in thread Possessive sub-pattern with non-greedy content + recursion: WHY does this work??
Rate rosetta_old rosetta_new eertree rosetta_old 20.0/s -- -98% -98% rosetta_new 864/s 4222% -- -27% eertree 1177/s 5784% 36% -- Rate rosetta_old rosetta_new eertree rosetta_old 2.63/s -- -99% -100% rosetta_new 215/s 8058% -- -67% eertree 654/s 24749% 205% -- 1..2
I missed the - $n because I was trying to get all the palindromes from the Rosetta code instead of just the unique ones. It seemed the new code worked correctly, but the old one didn't, so I started experimenting with replicating its incorrect output...
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^6: Possessive sub-pattern with non-greedy content + recursion: WHY does this work??
by Anonymous Monk on Aug 13, 2025 at 21:55 UTC | |
Heh, just a week later and I kicked myself hard once and again to finally read the PDF paper in earnest, and, of other sources, only a less cruel text linked in a footnote in said PDF. The blog entry, linked from PWC #145 task #2, didn't open for me, which I didn't pursue. I.e. I didn't consult anyone's code before writing this implementation (speed, clarity, interface have, of course, room for improvement):
Then I checked PWC #146 i.e. next one, for recap; there were _very_ few blog links (it was New Year's vacation time, unfortunately), and only _one_ says its Perl is Python's port of _real_ eertree. This one I ran and found just too slow for short inputs; then it gave "deep recursion" (what???) warnings for still too short but slightly longer input. I terminated the process it took too long. Then I checked only 2 more random GH directory PWC #145 entries; didn't find eertree there neither. Found a funny excuse _not_ to do an eertree :-). I'd be _really_ grateful if someone points me to a GH PWC #145 Perl's eertree code. Which leaves String::Eertree only, I think. With a harness as in (grand)*parent node:
Which, I guess, is due to "Perl's subs/methods calls are too expensive" (_lots_ of calls in S::E internally) and/or "Moo"? or something else? But wait, comparison above is unfair ("give all advantage etc.") because 'ee' skips some work if only unique palindromes are required. Let's rather do this:
I.e. contestants consume their strings, then build internal structures as required, without producing unique or non-unique lists (note: 'ee' also provides offsets for each (non-unique) pali which S::E doesn't. Not too much work, but still...) Yeah, that's more fair. In fact random chars, very short (200 _is_ short) strings don't look interesting. I tried what follows with (what looked at the time as) more entertaining input -- a string up to 400+k chars with "real" pali randomly interspersed within. Maybe it's as uninteresting as previous. The purpose was to also check the linearity i.e. O(n)-ness which, as I squinted in a skewed way (guess in which favour) didn't look right to me at first. False alarm, both appear O(n) ("X" - thousands of repetitions of known pali (randomly placed) in the input; Y -- seconds; A -- ee, B -- S::E). But perhaps an input can be constructed which results in catastrophic slowdown in A and/or B. I have no proof i.e. can't write scientific PDF papers.
Actually, thank you very much, @choroba, for introducing me to the subject. Very entertaining. | [reply] [d/l] [select] |
by Anonymous Monk on Aug 14, 2025 at 21:27 UTC | |
I'd be _really_ grateful if someone points me to a GH PWC #145 Perl's eertree code. Please don't, I completely forgot how to navigate PWC recap, and only checked "Blogs with Creative Title". There aren't too many not-empty GH sub-directories anyway. There, a few regex-based solutions, and quite a few Rosetta-Perl-snippet-like, but correct/fast (in O(N2) sense) -- a plenty to choose from to benchmark against instead of really really bad Rosetta Perl sample :-). Of eertree solutions: Mr. roger-bell-west's solution (gonna mention nicknames w/o people knowing... how else to refer to code snippets?) gives only 'a ama m ana n p l c' for 'amanaplanacanalpanama' input. They say it's Rosetta Python's port, but then it's broken port, had to disqualify, sorry. Mr. polettix' code generates "Use of uninitialized value in array element" warnings for non-trivial inputs, and 'ertreetre' "palindrome" for 'eereertreetree' input. Had to disqualify, sorry. Others seem to be correct, I gave them 1+ MB string input (50K randomly interspersed 'amanaplanacanalpanama's, similar to parent node), then in seconds:
All but choroba's are able to generate unique PD's only; so it's not the same league/work they do. Judging by "ee" vs. "ee(1)", it's 1:2 ratio of work/time. As imperfect test as it is, but it reveals the fastest of them all (subroutines which modify lexicals in enclosing scope(s)? Assignment operator w/o space on each side? Go figure) -- i.e. until "ee" arrival 3.5 years later:
"ee" is a call to return unique PD's, "ee(1)" is "find them all". BTW, looking at Mr.james-smith and Mr.wlmb code/comments ("growing a PD on each side of a 'center'"?) -- it's either they do not quite "eertree" as described in PDF, or it's somewhat modified algorithm described elsewhere I haven't seen. | [reply] [d/l] [select] |