use strict; use warnings; use feature 'say'; # use Regexp::Common; # ^^^ Not used. I'm so lazy, I just peeked at $RE{quoted} # to construct the "$quoted" expression below, by slightly # modifying it (see "$") to satisfy the third clause. # And actually 2nd test case below is to test how it works, # it seems there's not a similar one among your 18. my $quoted = qr/ (?:(?| (?:(?<!\\)\")(?:[^\\\"]*(?:\\.[^\\\"]*)*)(?:\"|$)| (?:(?<!\\)\')(?:[^\\\']*(?:\\.[^\\\']*)*)(?:\'|$) )) /x; my $re = qr/(?:$quoted|[^ ])+\K(?: |$)/; my @tests = ( q(This 'isn\'t nice.'), q(This 'isn\'t nice.), q(This \"isnt unnice.\"), ); for my $t ( @tests ) { say "[$_]" for split $re, $t; } __END__ [This] ['isn\'t nice.'] [This] ['isn\'t nice.] [This] [\"isnt] [unnice.\"]

10 minutes update: aargh, added negative look-behind to cover your 14th case (and added my third). Maybe there are more to add. Further: it's more tricky, 6 (and 7) are split in 3, but wrong, groups. Will look into that later. False alarm? Will see yet later :)

Next morning update. As LanX pointed out, negative look-behind for just a single backslash isn't enough. Then to save this answer (I like how the "keep" \K meta-character helps in regexp for split, it's kind of interesting), maybe it's easier to revert $quoted to as it was borrowed from $RE{quoted}, and tweak the $re:

my $quoted = qr/ (?:(?| (?:\")(?:[^\\\"]*(?:\\.[^\\\"]*)*)(?:\"|$)| (?:\')(?:[^\\\']*(?:\\.[^\\\']*)*)(?:\'|$) )) /x; my $re = qr/ (?: (?:\\\\)+ | (?:\\[^ ]) | $quoted | [^ ] )+ \K (?: \ | $ ) /x;

I hope it works now, my 1st attempt at this "update" was broken (see, but better not -- nothing interesting -- below. Sorry for the mess.). But further, it's unclear whether to split on escaped space, or several spaces in a row.

my $quoted = qr/ (?:(?| (?: (?:[^\\\'\ ]*(?:\\[^\ ][^\\\'\ ]*)*) \" ) (?: [^\\\"]* (?: \\ . [^\\\"]* )* ) (?:\"|$) | (?:(?:[^\\\' ]*(?:\\[^ ][^\\\' ]*)*)\')(?:[^\\\']*(?:\\.[^\\\']*)* +)(?:\'|$) )) /x;

And later (final(?)) update: Sigh... damn lack of practice. So this:

my $quoted = qr/ (?:(?| (?:\")(?:[^\\\"]*(?:\\.[^\\\"]*)*)(?:\"|$) | (?:\')(?:[^\\\']*(?:\\.[^\\\']*)*)(?:\'|$) )) /x; my $re = qr/ (?: (?:\\.)+ | $quoted | [^ \\"']+ )* \K (?: \ | $ )+ /x; # and later: my $got = [ split $re, $str ];

passes all tests in LanX's later answer except #2 and is somewhat optimized.

About test #2: consensus is "the brief is unclear", must split-like behaviour generate an empty leading field for #2? Expression to split on is definitely not missing nor space literal. If, nevertheless, it must not (as my solution does, failing #2), then my bad, but still, yeah, this regexp is "working" and can be used to literally split on. :)


In reply to Re: solution wanted for break-on-spaces (w/specifics) by vr
in thread solution wanted for break-on-spaces (w/specifics) by perl-diddler

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.