regex bottom line?

So, assuming that I'll need to roll my own parse_line by modifying the regex... what regex will provide the same functionality but work for arbitrarily large strings?

Since I still don't really understand what /(?!\1)[^\\]/ does, I am having trouble with this... I reason that it should match anything that's not a quote (whichever quote was opened at the start of the match), but I don't see how it does this...

Should I use tye's first regex? I also don't get how /((?:\\.|[^'"\\]+|(?!\1)['"])*)/ works...
Does
/[^'"\\]+|(?!\1)['"]/
do the same thing as
/(?!\1)[^\\]/
?

--
3dan

Comment on regex bottom line? Select or Download Code

Replies are listed 'Best First'.
Re: regex bottom line? (bottom method) by tye (Sage) on May 12, 2003 at 16:13 UTC
The only method that supports arbitrary strings is the last one, as I demonstrated. Does `/[^'"\\]+\|(?!\1)['"]/` [download] do the same thing as `/(?!\1)[^\\]/` [download] ? No. But `/[^'"\\]\|(?!\1)['"]/` (note that I removed the "+") and `/(?!\1)[^\\]/` are the same (provided \1 is either `"'"` or `'"'`). That is, they each match a single character that is not a backslash (\), nor the same as the quote character in \1. Since the regex is matching zero or more occurrences of X or Y or Z, it also works to match zero or more occurrences of X or Y+ or Z. Replacing Y with Y+ means we can grab tons of "uninteresting" characters quickly so that we don't have to loop through the surrounding `(?: ... )*` so many times (since we've seen that we are only allowed to loop through it 32k times). - tye	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re: regex bottom line? (bottom method)
by tye (Sage) on May 12, 2003 at 16:13 UTC

The only method that supports arbitrary strings is the last one, as I demonstrated.

Does
/[^'"\\]+|(?!\1)['"]/
[download]
do the same thing as
/(?!\1)[^\\]/
[download]
?

No. But /[^'"\\]|(?!\1)['"]/ (note that I removed the "+") and /(?!\1)[^\\]/ are the same (provided \1 is either "'" or '"'). That is, they each match a single character that is not a backslash (\), nor the same as the quote character in \1.

Since the regex is matching zero or more occurrences of X or Y or Z, it also works to match zero or more occurrences of X or Y+ or Z.

Replacing Y with Y+ means we can grab tons of "uninteresting" characters quickly so that we don't have to loop through the surrounding (?: ... )* so many times (since we've seen that we are only allowed to loop through it 32k times).

tye

[reply]
[d/l]
[select]