Re^5: Nonrepeating characters in an RE

Replies are listed 'Best First'.
Re^6: Nonrepeating characters in an RE by hv (Prior) on Aug 21, 2022 at 21:10 UTC
Here are some raw timings I get, using the extensive word list I have for crosswords: % perl -nwle 'END{ warn +(times)[0],"\n" } print $_ ' /usr/share/dict/hv.words \| wc -l 0.04 285520 % perl -nwle 'END{ warn +(times)[0],"\n" } print $_ if /^(.)(?!\1)(.)(?!\1\|\2)(.)(?!\1\|\2\|\3)(.)(?!\1\|\2\|\3\|\4) +(.)$/ ' /usr/share/dict/hv.words \| wc -l 0.13 8632 % perl -MList::MoreUtils=uniq -nwle 'END{ warn +(times)[0],"\n" } print $_ if /^(?:(.)(.)(.)(.)(.)(?(?{ 5 != uniq $1, $2, $3, $4, $5 } +)(?!)))$/ ' /usr/share/dict/hv.words \| wc -l 0.39 8632 % perl -MList::MoreUtils=uniq -nwle 'END{ warn +(times)[0],"\n" } print $_ if /^(.)(.)(.)(.)(.)$/ && 5 == uniq $1, $2, $3, $4, $5 ' /usr/share/dict/hv.words \| wc -l 0.1 8632 % [download] Oh, the anchor needs to come before the code block! `% perl -MList::MoreUtils=uniq -nwle 'END{ warn +(times)[0],"\n" } print $_ if /^(?:(.)(.)(.)(.)(.)$(?(?{ 5 != uniq $1, $2, $3, $4, $5 +})(?!)))/ ' /usr/share/dict/hv.words \| wc -l 0.12 8632 %` [download] I guess I was wrong (but it's important to get that anchor in the right place).	[reply] [d/l] [select]
Re^7: Nonrepeating characters in an RE by LanX (Saint) on Aug 21, 2022 at 21:42 UTC
I think this is rather an answer to Re^6: Nonrepeating characters in an RE (performance). And you tested the "worst case" of a pattern like `adieu` which resulted in 8632 hits. And it turns out that my intuition, that a posteriori filtering outside the regex is a sufficient approach, wasn't too bad. You didn't tell us the Perl version and I can't see a `use re 'eval'` happening, so no info about the observed slow down with newer versions. > (but it's important to get that anchor in the right place) Yes, that's a lesson I had to learn for this task already Re: Merging multiple variations of a serial number (regex as "mini prolog"). Because otherwise the regex will never reach an anchor behind a FAIL, hence longer words will be checked too. The OP didn't tell us which anchors he plans to use, so ... There is also `(?PRUNE)` to be considered to avoid unwanted backtracking, but in this case we have no quantifiers to spawn a tree anyway. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply] [d/l] [select]
Re^8: Nonrepeating characters in an RE by hv (Prior) on Aug 22, 2022 at 01:54 UTC
I think this is rather an answer to Re^6: Nonrepeating characters in an RE (performance). Oops, yes; I must have misclicked. If it's possible to have it moved, the conversation may make more sense. And you tested the "worst case" of a pattern like adieu which resulted in 8632 hits. Worst case for the problem in general, or for one of the solutions? For some reason my dictionary has both "Mississippi" and "mississippi", so gets 2 results: `% perl -MList::MoreUtils=uniq -nwle 'END{ warn +(times)[0],"\n" } print $_ if /^(?^:(.)(?!\1)(.)(?!\1\|\2)(.)\3\2\3\3\2(?!\1\|\2\|\3)(.)\ +4\2)$/ ' /usr/share/dict/hv.words \| wc -l 0.08 2 % perl -MList::MoreUtils=uniq -nwle 'END{ warn +(times)[0],"\n" } print $_ if /^(?:(.)(.)(.)\3\2\3\3\2(.)\4\2$(?(?{ 4 != uniq $1, $2, +$3, $4 })(?!)))/ ' /usr/share/dict/hv.words \| wc -l 0.09 2 %` [download] Of course if scanning the words in a dictionary is the OP's actual intent, all these solutions are amply fast enough for any likely template. You didn't tell us the Perl version and I can't see a use re 'eval' happening, so no info about the observed slow down with newer versions. This was my system perl, v5.26.1 for Ubuntu ("with 71 registered patches") - I was just curious about the relative speed of the different solutions. No `use re 'eval'` was needed, since I was providing the regexp directly without interpolating variables. (That should only make a difference at regexp-compile time, it shouldn't affect runtime at all.) (but it's important to get that anchor in the right place) If the anchor is at the end, your (re eval) solution still works, but does loads more work; so it's an optimization failure rather than a bug. I made that mistake, and mentioned it in particular, because AnomalousMonk's test script in 11146269 called your function to get the regexp and then wrapped anchors around it. I guess for cases like this it would be better to have the function include the anchors.	[reply] [d/l] [select]