in reply to splitting nothing?

I doubt this will change. Can you suggest a rewording of the doc? "empty leading non-trailing fields are preserved, and empty trailing ones are deleted." just doesn't quite cut it :)

Replies are listed 'Best First'.
Re^2: splitting nothing?
by bageler (Hermit) on Jul 13, 2004 at 23:53 UTC
    perhaps "If there are zero non-empty matches, all are treated as empty trailing fields and are deleted."
      How does this look:
      --- perlfunc.pod.orig 2004-06-01 05:37:39.000000000 -0700 +++ perlfunc.pod 2004-07-13 17:02:48.436164800 -0700 @@ -4986,7 +4986,7 @@ Splits the string EXPR into a list of strings and returns that list. + By default, empty leading fields are preserved, and empty trailing ones +are -deleted. +deleted. (If all fields are empty, they are considered to be trailin +g.) In scalar context, returns the number of fields found and splits into the C<@_> array. Use of split in scalar context is deprecated, howev +er,
        sounds clear to me :)
Re^2: splitting nothing?
by ihb (Deacon) on Jul 16, 2004 at 21:44 UTC

    Why even mention empty leading fields?

    My suggestion is to change

    Splits the string EXPR into a list of strings and returns that list. By default, empty leading fields are preserved, and empty trailing ones are deleted.
    to
    Splits the string EXPR into a list of strings and returns that list. By default, empty trailing fields are deleted.

    Update: Suggested patch: (two changes)

    4875,4876c4875 < default, empty leading fields are preserved, and empty trailing ones + are < deleted. --- > default, empty trailing fields are deleted. 4953c4952 < whitespace produces a null first field. A C<split> with no argument +s --- > whitespace may produce a null first field. A C<split> with no argum +ents

    See Re^4: splitting nothing? for motivation.

    ihb

      Because that's only the default. split " " (but not split / /) doesn't preserve leading empty fields.

        I was anticipating this very answer, but didn't want to clobber my first post and hoped I wouldn't have to write this reply. :-)

        Short version:

        It's unnecessary to mention leading empty fields in that paragraph as default behaviour because

        • where this sentence currently stands, there's no specification on how split() works--just what it returns,
        • there's only one case that doesn't produce an otherwise expected empty leading field (singular),
        • there's a conflict of whether a list with only empty fields holds leading or trailing empty fields as they can't be considered both in this case,
        • the only case those cases that doesn't produce an expected leading empty field (singular) is well documented and is already written in a way that doesn't conflict with trailing empty fields, and
        • it reduces complexity of the documentation without losing any information.

        See Re^2: splitting nothing? for a suggested documentation patch.

        Update: This doesn't change that clarifications on how split() works shouldn't be done. I'm just arguing that adding yet another rule to how it works isn't the way to go and by removing the sentence in question we actually make the documentation of split clearer.

        The really long version for the particularly interested:

        As I see it, there are at least two ways to solve this. One way is that we do as the patch at Re^3: splitting nothing? does and introduce yet more complexity by saying that empty leading fields that also are empty trailing fields aren't empty leading fields but empty trailing fields. Another is to attack the problem at the root and not confuse the reader with leading empty fields at all.

        You can tell split() to not ignore trailing empty fields. However, you cannot tell split() to disregard leading empty fields in the general case--it's only done for a particular case (if one choose to look at it as removal of empty fields rather than skipping of leading whitespaces--see below). For me, it's more confusing to say that it's a default behaviour instead of just documenting the special case.

        This "undefault" behaviour is already explained in the documentation:

        If PATTERN is also omitted, splits on whitespace (after skipping any leading whitespace).

        As we see, the documentation already resolves this issue by saying that for this special case the leading whitespaces are skipped rather than first splitting on them and then removing the resulting empty leading field. (My english isn't good enough to judge whether the documentation should put whitespace in plural or singular and if the documentation can be interpreted to split on /\s/ rather than /\s+/.)

        split; is equivalent to do { split /\s+/, /\s*(.*)/s && $1 } for defined values of $_. The /\s+/ pattern would at most produce one leading empty field which makes it excessive and confusing to talk about leading empty fields in pluralis.

        This is further explained:

        A split on /\s+/ is like a split(' ') except that any leading whitespace produces a null first field.

        ... and first can be last and we have said something about the last field if its empty but nothing about the first field, so no problem here either (except for split(/\s+/, '', -1) which produces an empty list--but that's another issue and too documented in perlfunc a couple of paragraphs above: "Note that splitting an EXPR that evaluates to the empty string always returns the empty list, regardless of the LIMIT specified.").

        I really believe that the magical disappearance of the leading empty field is documented enough to justify my suggestion. If one really really feel it's out of place to not mention this special case in the same sentence or paragraph (which would be a real pain if it always was done in the perldocs as Perl is full of special cases), just put a parenthesis that says "except for the special ' ' pattern; see below".

        Not mentioning leading empty fields avoids the conflict of how to choose whether ('')[0] is a leading or trailing empty field and at the same time reduces complexity of the documentation.

        ihb