Please note that I am at an Internet café right now and thus cannot test anything that I am writing, so be gentle with me :)
The regex you listed is better, but you should be aware that if you're working with data that someone else supplies, you may have to deal with escaped quotes. The regex /"([^"]+)"/g will probably not behave as you expect with the following:
my $string = qw!"This is \"data\""!;
So, we try the following:
$string =~ /"((?:\\"|[^"])*)"/;
Break that out:
$string =~ /" # first quote
( # capture to $1
(?: # non-capturing parens
\\" # an escaped quote
| # or
[^"] # a non-quote
)* # end grouping (zero or more of above)
) # end capture
"/x; # last quote
Looks good. We allow for escaped quotes, but what if the string is something like "test\". That's poorly formed, so we'll probably also have to allow escaped escapes (sigh). That means a string like "test\\". The following should be pretty close to what you want:
$string =~ /"((?:\\["\\]|[^"])*)"/;
It's really ugly, but should be closer to what you are might need. However, regular expressions such as these can get quite hairy. I understand that Text::Balanced is perfect for issues like this, but I have never used it.
Cheers,
Ovid
Join the Perlmonks Setiathome Group or just click on the the link and check out our stats. |