Marshall has asked for the wisdom of the Perl Monks concerning the following question:

I have known that these two things are not equivalent for a long time. The difference is in how white space is handled at the beginning of the string. I seek an explanation of why?
use strict; use warnings; use Data::Dumper; my $line = " X Y Z A\n"; my @tokens = split ' ', $line; print Dumper \@tokens; @tokens = split /\s+/, $line; print Dumper \@tokens; __END__ $VAR1 = [ 'X', 'Y', 'Z', 'A' ]; $VAR1 = [ '', 'X', 'Y', 'Z', 'A' ];

Replies are listed 'Best First'.
Re: Why is split ' ' ne split /\s+/?
by Preceptor (Deacon) on Jul 05, 2016 at 09:25 UTC

    Because pretty fundamentally - both behaviours are 'useful', and this way you can choose which you get.

    Discarding leading spaces is one of the more common scenarios when parsing text.

Re: Why is split ' ' ne split /\s+/?
by Anonymous Monk on Jul 05, 2016 at 00:59 UTC

    I seek an explanation of why?

    :) study history :D

      I do not consider that a particularly helpful answer.
        Magic split, using the space, is spelled out in perldoc -f split as such:

        As another special case, split emulates the default behavior of the command line tool awk when the PATTERN is either omitted or a string composed of a single space character (such as ' ' or "\x20" , but not e.g. / / ). In this case, any leading whitespace in EXPR is removed before splitting occurs, and the PATTERN is instead treated as if it were /\s+/ ; in particular, this means that any contiguous whitespace (not just a single space character) is used as a separator. However, this special treatment can be avoided by specifying the pattern / / instead of the string " " , thereby allowing only a single space character to be a separator. In earlier Perls this special case was restricted to the use of a plain " " as the pattern argument to split; in Perl 5.18.0 and later this special case is triggered by any expression which evaluates to the simple string " " .

        If omitted, PATTERN defaults to a single space, " " , triggering the previously described awk emulation.