I recently needed to parse words separated by whitespace out of a string, where a doublequoted word may contain spaces itself but no escaped doublequotes. The following regex will do, is quite tidy and also hardly backtracks (occasionally by a few characters, not more).
my @words = /"?((?<!")\S+(?!"\s)|[^"]+)"?\s*/g;
Comment on Simple way to parse whitespace separated, optionally quoted words out of a string
Your method has its flaws. It does not require proper balancing of quotes, and it breaks q{"oops"I did it again"} into q{oops} and q{I did it again} -- that is, two strings, where it should be five. It also returns q{""} for the string q{""}, when it should probably return just an empty string (q{}).
_____________________________________________________
Jeff[japhy]Pinyan:
Perl,
regex,
and perlhacker, who'd like a job (NYC-area)
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;
Thanks for your points. The latter was easy to fix, but the first one caused me a fair bit of work and I wasn't able to come up with a solution that worked with only a single capturing pair of parens (my original goal). The following shouldn't fail regardless of how pathological the case you throw at it gets, but requires a grep.
my @word = do {
my $i=0;
grep $i++&1, m/ (")? ((?(1) [^"]* | \S+)) \1\s* /xg
};