in reply to Re: Regex weirdness?
in thread Regex weirdness?

Of course. What I'm currently using is this:
m/([^"']+|(?:"(?:[^"]|\\")*")|(?:'(?:[^']|\\')*'))/g
Which does what I want. I'll look into the \G variant and see if that makes sense in my head (I have occasional problems with wrapping my head around regex stuff). My intent is to split a block of text into a list of elements, alternating between a quoted string (with the quotes) and a non-quoted string. For example the string print ( "some stuff", $more_stuff, "final stuff" ); should become this:
@list = ( q/print ( /, q/"some stuff"/, q/, $more_stuff, /, q/"final stuff"/, q/ );/ )

Replies are listed 'Best First'.
Re^3: Regex weirdness?
by Roy Johnson (Monsignor) on Mar 15, 2005 at 03:37 UTC
    To get a backreference to a quote, you have to put the quote in parens, which means it is going to be returned as a separate group. So I think you're going to have to stay with the separate alternatives for each type of quote.

    The /x option is absolutely straightforward: any whitespace within your regex is ignored. So you can pretty it up as you like. You can also put comments in it. I recommend you jump right into using it.

    The \G anchor tells the pattern to resume looking from where it last left off with the string. I don't think it's going to help you with what you're trying to do here.

    I notice that the backslash-protection of quotes doesn't work with your pattern. Consider that, within quotes, you will accept backslash followed by any character, and any run of non-backslash, non-quote characters. Or, you will accept a minimal match of any character leading up to a quote that is not preceded by a backslash. I illustrate both of these here (along with the use of /x):

    my @matches = m/([^"']+ |(?: " (?:\\.|[^\\"]+)* " ) # Double quote |(?: ' .*? (?<!\\)' ))/gx;
    Update: note that the second version will not recognize that \\' does not protect the quote.

    Caution: Contents may have been coded under pressure.