wrog has asked for the wisdom of the Perl Monks concerning the following question:

It seems really unlikely that I've stumbled onto a Perl bug in something that's been around basically forever. But at the very best this seems to be badly misdocumented. Either that or I'm incapable of reading, which is always possible.

Here's the problem:

split 'x','x'

in list context returns a completely empty list. This happens both in 5.10 and 5.16, and, I suspect, every other version as well.

Seems to me it should be returning ('',''), since 'x' is two empty strings separated by 'x'.

Yes, I'm aware there's all sorts of strange legacy shit in split -- in particular, lots of discretion to throw out empty components when the separator pattern is matching on empty strings, or when it's the special-case " " pattern designed to be backwards compatiable with awk -- but the pattern 'x' doesn't seem to be covered by any of the weird cases and should always be doing positive-width matches.

Nor does changing it to /x/ make any difference.

And perlfunc.pod#split is also really clear that

"An empty leading field is produced when there is a positive-width match at the beginning of EXPR. ...

An empty trailing field, on the other hand, is produced when there is a match at the end of EXPR, regardless of the length of the match (of course, unless a non-zero LIMIT is given explicitly, such fields are removed, as in the last example). ..."

And you'll notice above that I'm not setting any LIMITs.

So what am I missing? Where are my empty leading and trailing fields?

Is there a way to get split to return n components when there are n-1 'x's? (I already know how to do this with regexps, but it looks gross...)

Replies are listed 'Best First'.
Re: split tosses away empty components even with positive width separators?
by RichardK (Parson) on May 06, 2013 at 01:33 UTC

    The help for split say this, so I think it's expected behaviour (but it is very late here !)

    If LIMIT is omitted (or, equivalently, zero), then it is usually treated as if it were instead negative but with the exception that trailing empty fields are stripped (empty leading fields are always preserved); if all fields are empty, then all fields are considered to be trailing (and are thus stripped in this case).
      Ah, I need to set LIMIT = -1. Thank you!
Re: split tosses away empty components even with positive width separators?
by Kenosis (Priest) on May 06, 2013 at 01:32 UTC

    Consider the following:

    use strict; use warnings; my $empty = ''; my $newString = $empty . $empty . $empty . $empty;

    Is $newString now comprised of four empty strings? Continuing with the above, what about the following:

    my $forSplit = $newString . 'x' . $newString; my @list = split 'x', $forSplit;

    Would you expect @list to now contain the following?

    '','','','','','','',''
      Is $newString now comprised of four empty strings?
      No, because you're combining your $empty's using an empty separator, which I fully expect to be weird since you can't tell where the original boundaries were, and as I noted, is already covered in the documentation. The question is about positive-width separators...

        I'm not sure I understand the phrase "positive-width separators." An adjacency google search shows zero results for that phrase, except for your usage of it in this node and related reply.

Re: split tosses away empty components even with positive width separators?
by hdb (Monsignor) on May 06, 2013 at 06:57 UTC

    If you run split /(x)/, 'x';, you get at least your leading empty field plus one with the separator.

Re: split tosses away empty components even with positive width separators?
by Laurent_R (Canon) on May 06, 2013 at 11:05 UTC

    It is not a bug, it is a feature. ;-)

    A list of two empty lists is an empty list, there is nothing tremendously paradoxical about it, it is quite natural and common sense (even though a different design decision could possibly have been made, but it would probably have led to useless complexities).

      Perhaps so, if we were talking about a list of two empty lists, but a list of two empty strings is a totally different animal...

        Well, one must say that the behavior shown in the following debugger session does not seem to be completely consistent.

        DB<1> x split /-/, "-" empty array DB<2> x split /-/, "-foo-bar-" 0 '' 1 'foo' 2 'bar'