in reply to New Problem
in thread Tagging the last elements

I'm guessing that the whitespace separating the fields on each line of input may be variable in nature -- not just a single "\t" every time (e.g. sometimes it may be tab preceded and/or followed by spaces, and sometimes it may be just spaces with no tab).

That's why I suggested the unadorned split for breaking up the input line into fields. That is equivalent to

split(" ",$_)
(note the quoted space, not a regex), which says "ignore leading white space in the string, and return the list of strings separated by any amount of any kind of white space."

If some of your field values are expected to contain a space now and then, and your field separation is variable (not just a single "\t" every time), then you've got a problem with unparsable data, and you need to fix that first.

(updated to fix formatting)

Replies are listed 'Best First'.
Re^2: New Problem
by Marshall (Canon) on Jul 30, 2009 at 04:32 UTC
    The default split is: split (/\s+/,$_); or split (' ',$_);.

    Correction as per graff: split ' ',$_ will split on whitespace. I alway put a regex in there, but this alternate syntax is completely legal. This a bit different than the above split(" ",$_);. First, split takes a regex as the pattern and not a char string, so I'm not sure that " " even works.

    Anyway, splitting on a single space (or tab) is not the same as splitting on a sequence of the whitespace characters. The whitespace family has 5 chars: \s\f\r\n\t. /\s+/ will split on any of them. Since you can't actually see a whitespace char, "is that one space, two spaces or a tab" or whatever can be problematic.

    An interesting thing about this is when processing normal test lines, there is no need to "chomp" when using /\s+/ because \n is one of the split characters.

      From the "perlfunc" manual description of split:

      ... If PATTERN is ... omitted, splits on whitespace (after skipping any leading whitespace)... {3rd paragraph}

      ...

      As a special case, specifying a PATTERN of space (’ ’) will split on white space just as "split" with no arguments does. Thus, "split(’ ’)" can be used to emulate awk’s default behavior, whereas "split(/ /)" will give you as many null initial fields as there are leading spaces. A "split" on "/\s+/" is like a "split(’ ’)" except that any leading whitespace produces a null first field. A "split" with no arguments really does a "split(’ ’, $_)" internally. {about 7 paragraphs further down}