in reply to Re: Is there a way to allow consecutive zero-length matches without using pos()?
in thread Is there a way to allow consecutive zero-length matches without using pos()?

That's a great guess, but the usual culprit in my code is /\G(?=...)/ . JavaScript has some parsing quirks, and zero-width assertions help to identify the context a token is in. For example, "function" is a keyword but now may also be a property of an object. I'm trying to avoid analyzing the higher-level syntax just to tokenize it.

I'm working on a simple test case now, which is turning out more complicated than I thought. I'll post it when I get it.

  • Comment on Re^2: Is there a way to allow consecutive zero-length matches without using pos()?
  • Download Code

Replies are listed 'Best First'.
Re^3: Is there a way to allow consecutive zero-length matches without using pos()?
by ikegami (Patriarch) on Nov 08, 2017 at 08:05 UTC

    I'm trying to avoid analyzing the higher-level syntax just to tokenize it.

    Apparently not, since you are trying ascribe meaning to function. That's the job of the parser, not the tokenizer.

      Well, maybe another example would be clearer: To correctly tokenize the code "if (expr) /...", you have to know that "if" is a keyword. Otherwise, it looks like a function call followed by a division operator, rather than an if statement that includes a regular expression. And then there's automatic semicolon insertion, which has to happen when the higher-level syntax doesn't allow the two tokens surrounding a linefeed to be consecutive.

      I'm not a parsing expert (had to figure it out myself), so I'm happy to hear of any better approaches. Any solution has to be fast.

      In any case, I'm still looking for an alternative to setting pos() to allow consecutive zero-length matches on UTF-8 strings. Hoping not to rewrite the whole routine to avoid zero-length matches.