in reply to Re^7: Nonrepeating characters in an RE
in thread Nonrepeating characters in an RE
I think this is rather an answer to Re^6: Nonrepeating characters in an RE (performance).
Oops, yes; I must have misclicked. If it's possible to have it moved, the conversation may make more sense.
And you tested the "worst case" of a pattern like adieu which resulted in 8632 hits.
Worst case for the problem in general, or for one of the solutions? For some reason my dictionary has both "Mississippi" and "mississippi", so gets 2 results:
% perl -MList::MoreUtils=uniq -nwle 'END{ warn +(times)[0],"\n" } print $_ if /^(?^:(.)(?!\1)(.)(?!\1|\2)(.)\3\2\3\3\2(?!\1|\2|\3)(.)\ +4\2)$/ ' /usr/share/dict/hv.words | wc -l 0.08 2 % perl -MList::MoreUtils=uniq -nwle 'END{ warn +(times)[0],"\n" } print $_ if /^(?:(.)(.)(.)\3\2\3\3\2(.)\4\2$(?(?{ 4 != uniq $1, $2, +$3, $4 })(?!)))/ ' /usr/share/dict/hv.words | wc -l 0.09 2 %
Of course if scanning the words in a dictionary is the OP's actual intent, all these solutions are amply fast enough for any likely template.
You didn't tell us the Perl version and I can't see a use re 'eval' happening, so no info about the observed slow down with newer versions.
This was my system perl, v5.26.1 for Ubuntu ("with 71 registered patches") - I was just curious about the relative speed of the different solutions. No use re 'eval' was needed, since I was providing the regexp directly without interpolating variables. (That should only make a difference at regexp-compile time, it shouldn't affect runtime at all.)
(but it's important to get that anchor in the right place)
If the anchor is at the end, your (re eval) solution still works, but does loads more work; so it's an optimization failure rather than a bug.
I made that mistake, and mentioned it in particular, because AnomalousMonk's test script in 11146269 called your function to get the regexp and then wrapped anchors around it. I guess for cases like this it would be better to have the function include the anchors.
|
|---|