Re^2: Perl regexp matching is slow??

Thanks. You'll get some pushback from the curmudgeons, but by and large I think we're a healthy enough community that we're actually trying to learn from our mistakes, at least every now and then. Certainly that applies to regex syntax; I think it's somewhat ironic that, just as the rest of the world is adopting Perl 5 regex syntax, Perl 6 is abandoning it. :-)

Anyway, I'd be very interested in any suggestions you might have regarding the hybrid approach we're taking in Synopsis 5, especially the implicit longest-token semantics.

Comment on Re^2: Perl regexp matching is slow??

Replies are listed 'Best First'.
Re^3: Perl regexp matching is slow?? by rsc (Acolyte) on Jan 31, 2007 at 22:15 UTC
It certainly sounds like distinguishing \| and \|\| is a good idea. I do not claim to fully understand everything in Synopsis 5, but the general idea seems like being on the right track. One interesting gotcha (I can't tell whether it is handled correctly or not). Consider the Perl 5 regular expression `(.)\1` run on `(("a" x $n) . "b") x 2`. It should match. However, figuring that out requires trying many possible matches for the `(.)` part: the first try would be the whole string, then the whole string minus one char, etc., until you got down to just half the string. It is not clear to me that the longest-token semantics would get this right: it sounds like they would grab the whole string during `(.*)`, fail to match in the equivalent of `\1`, and then give up. All this is just to say that backreferences are even harder than they first appear, not that the longest-token idea is flawed.	[reply] [d/l] [select]
Re^3: Perl regexp matching is slow?? by demerphq (Chancellor) on Feb 01, 2007 at 08:36 UTC
Going by my experience with the trie optimisation the only difference between leftmost-longest and longest-token will be that you dont need to sort the possible accepting pathways through an alternation when cycling through them to find the one that allows the following pattern to succeed. Assuming you use a stack to track all the possible accepting pathways then in the case of longest-token they will already be sorted so you just keep popping them off the stack until one matches. The trie code however has to somehow process them in the order they occur in the alternation, which in the perl 5.9.5 implementation involves a linear probe through the list to find the best match each time. (The assumption is that the list will most often be very small.) --- $world=~s/war/peace/g	[reply]