in reply to Re: New Problem
in thread Tagging the last elements

The default split is: split (/\s+/,$_); or split (' ',$_);.

Correction as per graff: split ' ',$_ will split on whitespace. I alway put a regex in there, but this alternate syntax is completely legal. This a bit different than the above split(" ",$_);. First, split takes a regex as the pattern and not a char string, so I'm not sure that " " even works.

Anyway, splitting on a single space (or tab) is not the same as splitting on a sequence of the whitespace characters. The whitespace family has 5 chars: \s\f\r\n\t. /\s+/ will split on any of them. Since you can't actually see a whitespace char, "is that one space, two spaces or a tab" or whatever can be problematic.

An interesting thing about this is when processing normal test lines, there is no need to "chomp" when using /\s+/ because \n is one of the split characters.

Replies are listed 'Best First'.
Re^3: New Problem
by graff (Chancellor) on Jul 30, 2009 at 04:47 UTC
    From the "perlfunc" manual description of split:

    ... If PATTERN is ... omitted, splits on whitespace (after skipping any leading whitespace)... {3rd paragraph}

    ...

    As a special case, specifying a PATTERN of space (’ ’) will split on white space just as "split" with no arguments does. Thus, "split(’ ’)" can be used to emulate awk’s default behavior, whereas "split(/ /)" will give you as many null initial fields as there are leading spaces. A "split" on "/\s+/" is like a "split(’ ’)" except that any leading whitespace produces a null first field. A "split" with no arguments really does a "split(’ ’, $_)" internally. {about 7 paragraphs further down}