in reply to Re^3: Is there a way to allow consecutive zero-length matches without using pos()?
in thread Is there a way to allow consecutive zero-length matches without using pos()?
Well, maybe another example would be clearer: To correctly tokenize the code "if (expr) /...", you have to know that "if" is a keyword. Otherwise, it looks like a function call followed by a division operator, rather than an if statement that includes a regular expression. And then there's automatic semicolon insertion, which has to happen when the higher-level syntax doesn't allow the two tokens surrounding a linefeed to be consecutive.
I'm not a parsing expert (had to figure it out myself), so I'm happy to hear of any better approaches. Any solution has to be fast.
In any case, I'm still looking for an alternative to setting pos() to allow consecutive zero-length matches on UTF-8 strings. Hoping not to rewrite the whole routine to avoid zero-length matches.
|
|---|