comment on

Ah, the master speaks. When I saw your post filled with (? syntax, I knew you had addressed the subtleties of the problem.

So, let me understand... the first thing, (?=\S) will fail if the next character is whitespace or there is no next character. I wonder why we need that? Ah, it interacts with the /g to say "no match at this postion" to actually skip the spaces! And the spaces naturally don't wind up in the returned array. Beautiful.

I also like the way the non-quote stuff is always first, and the quoted part is an optional part that follows, rather than having two totally different cases.

So first it matches everything that's not whitespace or a quote.

Then it picks up the quote, stuff inside it, and close quote. Then it has [^\s"]* again, and the whole thing is in a repeat star. That means it will handle anything with an even number of quotes in it, not just a single pair and end on the close-quote.

That is an interesting generalization, and I rather like it.

I suppose you couldn't pull the non-quote non-whitespace part out of the loop because it must be performed at least once. Ah, but you know it's not a space already, and taking out the 3rd line and changing the 7th line from * to + would work, and further allow things that begin with a quote. Would it not?

—John

In reply to Re: Re: Not quite a simple split by John M. Dlugosz
in thread Not quite a simple split by John M. Dlugosz

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.