in reply to Re: reg ex help
in thread reg ex help

FWIW split ' ', $data is magical in that the ' ' matches any quanitity of whitespace ie spaces, tabs, newlines. Syntactic sugar for \s+ really.

cheers

tachyon

Replies are listed 'Best First'.
Re: reg ex help
by Abigail-II (Bishop) on Apr 15, 2004 at 10:43 UTC
    FWIW split ' ', $data is magical in that the ' ' matches any quanitity of whitespace ie spaces, tabs, newlines. Syntactic sugar for \s+ really.
    split ' ' and split /\s+/ are not quite the same:
    #!/usr/bin/perl use strict; use warnings; $_ = " foo bar baz "; my @a = split ' '; my @b = split /\s+/; print scalar @a, "\n"; print scalar @b, "\n"; __END__ 3 4

    Abigail

      Thanks for clarifying that ' ' is DWIM (magical) with respect to leading whitespace.

      #!/usr/bin/perl use Data::Dumper; $_ = " foo bar baz "; my @a = split ' '; my @b = split /\s+/; print Dumper \@a; print Dumper \@b; __DATA__ $VAR1 = [ 'foo', 'bar', 'baz' ]; $VAR1 = [ '', 'foo', 'bar', 'baz' ];

      What interests me from a 'why is it so?' point of view is that if \s+ splits at the begining of a string to find the NULL why not at the END as well. Your example documants the behaviour but why is leading whitespace treated differently from trailing whitespace? There is afterall a null string after the trailing whitespace as well.

      cheers

      tachyon

        What interests me from a 'why is it so?' point of view is that if \s+ splits at the begining of a string to find the NULL why not at the END as well. Your example documants the behaviour but why is leading whitespace treated differently from trailing whitespace? There is afterall a null string after the trailing whitespace as well.
        The defaults of split are to ignore trailing empty fields, and to keep leading empty fields. Why these are the defaults, I can only speculate. Leaving off trailing empty fields is relatively harmless, an empty string is false, and so is a non-existing array element. But in many cases, leaving off empty leading fields only brings havoc. Suppose you have some tabulated process data: controlling terminal, PID, UID, process name, arguments. Some processes don't have arguments, and some don't have a controlling terminal. If you leave off the empty arguments fields, there's no harm. But if you leave off the empty controlling terminal field, in the resulting list, the PID is suddenly in position 0, not position 1.

        As for split ' ' leaving off leading empty fields, this is the exception, and specifically done to simulate the behaviour of AWK.

        Abigail