ManyCrows has asked for the wisdom of the Perl Monks concerning the following question:

I've been trying to write a sub to return an array of words and quote-delimited phrases when passed a string, but I can't seem to get the regular expression right no matter what I do.

I don't need to deal with arbitrary nesting. In fact I'm only going one level deep. I also don't need balanced text parsing, i truly just want the quote-delimited phrases and the words, hopefully with a single regexp.

Suggestions? Pointers? Answers?

Gracias!

Replies are listed 'Best First'.
Re: phrase and word searching
by chipmunk (Parson) on Jun 01, 2001 at 00:16 UTC
    Here's a regex that should get you started: /("[^"]*"|[^\s"]+)/g; This regex makes two following assumptions: a word is a sequence of non-whitespace, non-quote characters, and a quotation cannot contain embedded quotes.
Here's an example:
sub split_text { $_[0] =~ /("[^"]*"|[^\s"]+)/g; } my $text = 'The dog said "Hi, how are you?" I laughed.'; my @pieces = split_text($text); print "$_\n" for @pieces; __END__ The dog said "Hi, how are you?" I laughed.
      perfection! muchos gracias!
(jeffa) Re: phrase and word searching
by jeffa (Bishop) on Jun 01, 2001 at 00:29 UTC
    And here is another way (you can wrap it in a function if you wish) that returns the text inside the quotes without the qoutes:
    my $str = qq|"this is one" and "anot""her one" is "" right "her"e|; my @stuff = map { /([^"]+)"$/ } $str =~ /"([^"]*"|[^\s"]+)/g; print join("\n", @stuff), "\n";
    The first regular expression returns a list like so:
    this is one" anot" her one" " her"
    and the second regex inside the map removes the left over quote and, as a feature of map, doesn't return the 'empty' ones - so the final list looks like:
    this is one anot her one her
    I am sure that this process could be compacted into yet another single regex, but it eludes me.

    Jeff

    R-R-R--R-R-R--R-R-R--R-R-R--R-R-R--
    L-L--L-L--L-L--L-L--L-L--L-L--L-L--