in reply to Re^4: Nonrepeating characters in an RE
in thread Nonrepeating characters in an RE
I was quite pleasantly surprised its speed. I notice, however, that the code runs about three times more slowly under 5.30 than under 5.8. I assume this is due to the many modifications made to the regex engine over the years to accommodate Unicode. Any comment on this would be of interest.
3x slower seems an awful lot. Are both perls built with the same options?
Unicode support could be part of it: it would be worth redoing the timings under 5.30 supplying either /a or /aa as a flag on the regexp. However for these patterns (ASCII input and pattern, no use of \w-style classes) I'd expect the cost between versions to be low, and the benefit of the /a flags to be zero.
There have also been numerous small changes added in recent years as a result of bugs (mostly found by fuzzers) that had the potential to be security holes. Such changes almost always make things a tiny bit slower, and as you accumulate more and more of them they add up. So it might also be informative to run the regexp tests from 5.30 under 5.8 (which might need some adaptation) to see a sample of the bugs 5.30 doesn't have.
If you're doing timings, I'd also be interested how my code from 11146164 compares - I'd expect it to win a lot by avoiding the embedded code block, and give back a fraction of that by doing more backtracking. (You should add at least the /s flag to my qr{} for proper comparison.)
# (what is proper match behavior of empty template?)
Since we're assuming implicit anchors, I think it should just match the empty string. That wasn't a case I attempted to handle though, and I doubt the OP cares about it.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^6: Nonrepeating characters in an RE (performance)
by LanX (Saint) on Aug 21, 2022 at 18:29 UTC |