in reply to Re: In need of a stupid regex trick
in thread In need of a stupid regex trick

perl -wle '@l = split(/(?:"([^"]*)"|\s+)/, $ARGV[0]);$,="\t";print @l;' 'one "two three" four "five"'

Replies are listed 'Best First'.
Re: Re: Re: In need of a stupid regex trick
by ysth (Canon) on Jan 05, 2004 at 02:40 UTC
    Nice.

    Slight tweaks to require space around "quoted string" (which you may or may not want) and remove undef entries. Update: and remove empty entries.

    perl -wle'@list = grep defined && length, split /(?:(?<!\S)"([^"]*)"(? +!\S))|\s+/, shift;print for @list' 'one "two three" four "five"'
    perl -wle'@list = grep defined, split /(?:(?<!\S)"([^"]*)"(?!\S))|\s+/ +, shift;print for @list' 'one "two three" four "five"'
      I can't see how it would be able to get undef.. Do you happen to have an example which has undef entries?
        Taking the example from the post I commented on, the code had split(/(?:"([^"]*)"|\s+)/. The outer (?:) is actually useless and can be removed. Applying it gives:
        use Data::Dumper; $Data::Dumper::Terse = 1; print Dumper $_ for split /"([^"]*)"|\s+/, 'this "is" an "example"'; __END__ 'this' undef '' 'is' '' undef 'an' undef '' 'example'
        Update: rewrote following paragraph for clarity.

        The split splits the string 'this "is" an "example"' up everywhere there is a quoted string or repeated whitespace. This yields the substrings "this", "", "", "an", and "". Because there was one set of capturing parentheses in the split pattern, there will be one extra value returned for each split (this is a feature that allows you to split on several delimiters and see which delimiter was used). These extra values are undef, "is", undef, undef, and "example". Where the split was on whitespace, the value is undef because the capturing parentheses were not actually used in the match. Where the split was on a quoted string, the value is what was in the quotes.