in reply to Re^2: splitting nothing?
in thread splitting nothing?

Because that's only the default. split " " (but not split / /) doesn't preserve leading empty fields.

Replies are listed 'Best First'.
Re^4: splitting nothing?
by ihb (Deacon) on Jul 17, 2004 at 19:49 UTC

    I was anticipating this very answer, but didn't want to clobber my first post and hoped I wouldn't have to write this reply. :-)

    Short version:

    It's unnecessary to mention leading empty fields in that paragraph as default behaviour because

    • where this sentence currently stands, there's no specification on how split() works--just what it returns,
    • there's only one case that doesn't produce an otherwise expected empty leading field (singular),
    • there's a conflict of whether a list with only empty fields holds leading or trailing empty fields as they can't be considered both in this case,
    • the only case those cases that doesn't produce an expected leading empty field (singular) is well documented and is already written in a way that doesn't conflict with trailing empty fields, and
    • it reduces complexity of the documentation without losing any information.

    See Re^2: splitting nothing? for a suggested documentation patch.

    Update: This doesn't change that clarifications on how split() works shouldn't be done. I'm just arguing that adding yet another rule to how it works isn't the way to go and by removing the sentence in question we actually make the documentation of split clearer.

    The really long version for the particularly interested:

    As I see it, there are at least two ways to solve this. One way is that we do as the patch at Re^3: splitting nothing? does and introduce yet more complexity by saying that empty leading fields that also are empty trailing fields aren't empty leading fields but empty trailing fields. Another is to attack the problem at the root and not confuse the reader with leading empty fields at all.

    You can tell split() to not ignore trailing empty fields. However, you cannot tell split() to disregard leading empty fields in the general case--it's only done for a particular case (if one choose to look at it as removal of empty fields rather than skipping of leading whitespaces--see below). For me, it's more confusing to say that it's a default behaviour instead of just documenting the special case.

    This "undefault" behaviour is already explained in the documentation:

    If PATTERN is also omitted, splits on whitespace (after skipping any leading whitespace).

    As we see, the documentation already resolves this issue by saying that for this special case the leading whitespaces are skipped rather than first splitting on them and then removing the resulting empty leading field. (My english isn't good enough to judge whether the documentation should put whitespace in plural or singular and if the documentation can be interpreted to split on /\s/ rather than /\s+/.)

    split; is equivalent to do { split /\s+/, /\s*(.*)/s && $1 } for defined values of $_. The /\s+/ pattern would at most produce one leading empty field which makes it excessive and confusing to talk about leading empty fields in pluralis.

    This is further explained:

    A split on /\s+/ is like a split(' ') except that any leading whitespace produces a null first field.

    ... and first can be last and we have said something about the last field if its empty but nothing about the first field, so no problem here either (except for split(/\s+/, '', -1) which produces an empty list--but that's another issue and too documented in perlfunc a couple of paragraphs above: "Note that splitting an EXPR that evaluates to the empty string always returns the empty list, regardless of the LIMIT specified.").

    I really believe that the magical disappearance of the leading empty field is documented enough to justify my suggestion. If one really really feel it's out of place to not mention this special case in the same sentence or paragraph (which would be a real pain if it always was done in the perldocs as Perl is full of special cases), just put a parenthesis that says "except for the special ' ' pattern; see below".

    Not mentioning leading empty fields avoids the conflict of how to choose whether ('')[0] is a leading or trailing empty field and at the same time reduces complexity of the documentation.

    ihb

      I forgot the other case where a leading empty field is discarded: a zero-width match at the beginning of the string. And I feel the split; and split ' ' case is significant enough to warrant bringing up the concept of empty leading fields there at the top, while still leaving the details of when they are preserved and when they are not to come later. That is, I am happy with how it is now (after Re^3: splitting nothing?).

      If you want to submit a patch, see perlhack for instructions.

        bringing up the concept of empty leading fields there at the top

        This can still be done without adding another rule to how split works. My suggestion is to attack the problem from a different angle and avoid the conflict by saying something like: "in some special cases expected leading empty fields are deleted; see below". Then we don't say they should be preserved, and the conflict is avoided.

        ihb