in reply to Re: Re: reg ex help
in thread reg ex help

What interests me from a 'why is it so?' point of view is that if \s+ splits at the begining of a string to find the NULL why not at the END as well. Your example documants the behaviour but why is leading whitespace treated differently from trailing whitespace? There is afterall a null string after the trailing whitespace as well.
The defaults of split are to ignore trailing empty fields, and to keep leading empty fields. Why these are the defaults, I can only speculate. Leaving off trailing empty fields is relatively harmless, an empty string is false, and so is a non-existing array element. But in many cases, leaving off empty leading fields only brings havoc. Suppose you have some tabulated process data: controlling terminal, PID, UID, process name, arguments. Some processes don't have arguments, and some don't have a controlling terminal. If you leave off the empty arguments fields, there's no harm. But if you leave off the empty controlling terminal field, in the resulting list, the PID is suddenly in position 0, not position 1.

As for split ' ' leaving off leading empty fields, this is the exception, and specifically done to simulate the behaviour of AWK.

Abigail

Replies are listed 'Best First'.
Re: Re: reg ex help
by pelagic (Priest) on Apr 15, 2004 at 12:06 UTC
    ... and you can still set LIMIT to -1
    split /PATTERN/, EXPR, LIMIT
    if you don't want your trailling empties to be stripped.

    pelagic