in reply to •Re: Re: Text::ParseWords regex doesn't work when text is too long? (fixes)
in thread Text::ParseWords regex doesn't work when text is too long?

So, assuming that I'll need to roll my own parse_line by modifying the regex... what regex will provide the same functionality but work for arbitrarily large strings?

Since I still don't really understand what /(?!\1)[^\\]/ does, I am having trouble with this... I reason that it should match anything that's not a quote (whichever quote was opened at the start of the match), but I don't see how it does this...

Should I use tye's first regex? I also don't get how /((?:\\.|[^'"\\]+|(?!\1)['"])*)/ works...
Does
/[^'"\\]+|(?!\1)['"]/
do the same thing as
/(?!\1)[^\\]/
?

--
3dan

Replies are listed 'Best First'.
Re: regex bottom line? (bottom method)
by tye (Sage) on May 12, 2003 at 16:13 UTC

    The only method that supports arbitrary strings is the last one, as I demonstrated.

    Does
    /[^'"\\]+|(?!\1)['"]­/
    do the same thing as
    /(?!\1)[^\\]/
    ?

    No. But /[^'"\\]|(?!\1)['"]­/ (note that I removed the "+") and /(?!\1)[^\\]/ are the same (provided \1 is either "'" or '"'). That is, they each match a single character that is not a backslash (\), nor the same as the quote character in \1.

    Since the regex is matching zero or more occurrences of X or Y or Z, it also works to match zero or more occurrences of X or Y+ or Z.

    Replacing Y with Y+ means we can grab tons of "uninteresting" characters quickly so that we don't have to loop through the surrounding (?: ... )* so many times (since we've seen that we are only allowed to loop through it 32k times).

                    - tye