A coworker ran into a problem parsing some data, here's an example:
$_="|" x 5; @foo = split /\|/,$_; print scalar @foo; # prints 0
he expected @foo to have 6 elements. According to the perl documentation:
Splits a string into a list of strings and returns that list. By default, empty leading fields are preserved, and empty trailing ones are deleted.
So if there are no fields to be parsed by the split, then it treats them all as empty trailing fields. My coworker said, "I think of them as empty leading fields!" The solution is to split as such:
@foo = split /\|/,$_,-1
to set an arbitrarily large limit and force split to treat them as empty leading fields, but that doesn't seem like the right (read: perl) way for things to happen. I'd expect perl to gracefully treat complete pseudo-emptiness as something, rather than nothing.

Replies are listed 'Best First'.
Re: splitting nothing?
by etcshadow (Priest) on Jul 13, 2004 at 22:18 UTC
    Not the same problem (as your issue can be easily worked around by adding a -1 as the lsat arg to split), but when you mention "splitting nothing" what comes to mind is an annoying bug I was helping someone fix recently. It basically boils down to this:
    my @empty1 = (); my $empty1 = join ",", @empty1; @empty1 = split ",", $empty1; my @empty2 = (""); my $empty2 = join ",", @empty2; @empty2 = split ",", $empty2;
    So, why is that a problem? Well, $empty1 and $empty2 both come out the same (the empty string), and reversing the join with a split causes the two distinct cases to collapse.

    Anyway, just a fun gotcha that your title reminded me of.

    ------------ :Wq Not an editor command: Wq
Re: splitting nothing?
by Zaxo (Archbishop) on Jul 13, 2004 at 21:37 UTC

    Well, dwimmerie doesn't count for much if it's not what you meant. One of the CSV modules will do what you want in a prettier fashion.

    After Compline,
    Zaxo

Re: splitting nothing?
by ysth (Canon) on Jul 13, 2004 at 21:57 UTC
    I doubt this will change. Can you suggest a rewording of the doc? "empty leading non-trailing fields are preserved, and empty trailing ones are deleted." just doesn't quite cut it :)
      perhaps "If there are zero non-empty matches, all are treated as empty trailing fields and are deleted."
        How does this look:
        --- perlfunc.pod.orig 2004-06-01 05:37:39.000000000 -0700 +++ perlfunc.pod 2004-07-13 17:02:48.436164800 -0700 @@ -4986,7 +4986,7 @@ Splits the string EXPR into a list of strings and returns that list. + By default, empty leading fields are preserved, and empty trailing ones +are -deleted. +deleted. (If all fields are empty, they are considered to be trailin +g.) In scalar context, returns the number of fields found and splits into the C<@_> array. Use of split in scalar context is deprecated, howev +er,

      Why even mention empty leading fields?

      My suggestion is to change

      Splits the string EXPR into a list of strings and returns that list. By default, empty leading fields are preserved, and empty trailing ones are deleted.
      to
      Splits the string EXPR into a list of strings and returns that list. By default, empty trailing fields are deleted.

      Update: Suggested patch: (two changes)

      4875,4876c4875 < default, empty leading fields are preserved, and empty trailing ones + are < deleted. --- > default, empty trailing fields are deleted. 4953c4952 < whitespace produces a null first field. A C<split> with no argument +s --- > whitespace may produce a null first field. A C<split> with no argum +ents

      See Re^4: splitting nothing? for motivation.

      ihb

        Because that's only the default. split " " (but not split / /) doesn't preserve leading empty fields.