comment on

I was anticipating this very answer, but didn't want to clobber my first post and hoped I wouldn't have to write this reply. :-)

Short version:

It's unnecessary to mention leading empty fields in that paragraph as default behaviour because

where this sentence currently stands, there's no specification on how split() works--just what it returns,
~~there's only one case that doesn't produce an otherwise expected empty leading field (singular),~~
there's a conflict of whether a list with only empty fields holds leading or trailing empty fields as they can't be considered both in this case,
~~the only case~~ those cases that doesn't produce an expected leading empty field (singular) is well documented and is already written in a way that doesn't conflict with trailing empty fields, and
it reduces complexity of the documentation without losing any information.

See Re^2: splitting nothing? for a suggested documentation patch.

Update: This doesn't change that clarifications on how split() works shouldn't be done. I'm just arguing that adding yet another rule to how it works isn't the way to go and by removing the sentence in question we actually make the documentation of split clearer.

The really long version for the particularly interested:

As I see it, there are at least two ways to solve this. One way is that we do as the patch at Re^3: splitting nothing? does and introduce yet more complexity by saying that empty leading fields that also are empty trailing fields aren't empty leading fields but empty trailing fields. Another is to attack the problem at the root and not confuse the reader with leading empty fields at all.

You can tell split() to not ignore trailing empty fields. However, you cannot tell split() to disregard leading empty fields in the general case--it's only done for a particular case (if one choose to look at it as removal of empty fields rather than skipping of leading whitespaces--see below). For me, it's more confusing to say that it's a default behaviour instead of just documenting the special case.

This "undefault" behaviour is already explained in the documentation:

If PATTERN is also omitted, splits on whitespace (after skipping any leading whitespace).

As we see, the documentation already resolves this issue by saying that for this special case the leading whitespaces are skipped rather than first splitting on them and then removing the resulting empty leading field. (My english isn't good enough to judge whether the documentation should put whitespace in plural or singular and if the documentation can be interpreted to split on /\s/ rather than /\s+/.)

split; is equivalent to do { split /\s+/, /\s*(.*)/s && $1 } for defined values of $_. The /\s+/ pattern would at most produce one leading empty field which makes it excessive and confusing to talk about leading empty fields in pluralis.

This is further explained:

A split on /\s+/ is like a split(' ') except that any leading whitespace produces a null first field.

... and first can be last and we have said something about the last field if its empty but nothing about the first field, so no problem here either (except for split(/\s+/, '', -1) which produces an empty list--but that's another issue and too documented in perlfunc a couple of paragraphs above: "Note that splitting an EXPR that evaluates to the empty string always returns the empty list, regardless of the LIMIT specified.").

I really believe that the magical disappearance of the leading empty field is documented enough to justify my suggestion. If one really really feel it's out of place to not mention this special case in the same sentence or paragraph (which would be a real pain if it always was done in the perldocs as Perl is full of special cases), just put a parenthesis that says "except for the special ' ' pattern; see below".

Not mentioning leading empty fields avoids the conflict of how to choose whether ('')[0] is a leading or trailing empty field and at the same time reduces complexity of the documentation.

ihb

In reply to Re^4: splitting nothing? by ihb
in thread splitting nothing? by bageler

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.