Of course. What I'm currently using is this:
m/([^"']+|(?:"(?:[^"]|\\")*")|(?:'(?:[^']|\\')*'))/g
Which does what I want. I'll look into the \G variant and see if that makes sense in my head (I have occasional problems with wrapping my head around regex stuff). My intent is to split a block of text into a list of elements, alternating between a quoted string (with the quotes) and a non-quoted string.
For example the string print ( "some stuff", $more_stuff, "final stuff" ); should become this: @list = (
q/print ( /,
q/"some stuff"/,
q/, $more_stuff, /,
q/"final stuff"/,
q/ );/
)
| [reply] [d/l] [select] |
To get a backreference to a quote, you have to put the quote in parens, which means it is going to be returned as a separate group. So I think you're going to have to stay with the separate alternatives for each type of quote.
The /x option is absolutely straightforward: any whitespace within your regex is ignored. So you can pretty it up as you like. You can also put comments in it. I recommend you jump right into using it.
The \G anchor tells the pattern to resume looking from where it last left off with the string. I don't think it's going to help you with what you're trying to do here.
I notice that the backslash-protection of quotes doesn't work with your pattern. Consider that, within quotes, you will accept backslash followed by any character, and any run of non-backslash, non-quote characters. Or, you will accept a minimal match of any character leading up to a quote that is not preceded by a backslash. I illustrate both of these here (along with the use of /x):
my @matches = m/([^"']+
|(?: " (?:\\.|[^\\"]+)* " ) # Double quote
|(?: ' .*? (?<!\\)' ))/gx;
Update: note that the second version will not recognize that \\' does not protect the quote.
Caution: Contents may have been coded under pressure.
| [reply] [d/l] |