Moron has asked for the wisdom of the Perl Monks concerning the following question:

A user supplies a file which has a header and a number of rows. The header takes the form:

FA1 '|' FA2 '|' FA3 '|' FA4 [ '|' VA1 .. ]

where FAn are always-required attribute ids and VAn are zero or more optional attribute ids of no particular limit in number.

The rows thereafter are values for insert or update to the database.

The problem is that when using the split command:

my ( $f1, $f2, $f3, $f4, @v ) = split( /\|/ );
(under perl version 5.6.1 if that matters) any trailing empty fields are missed from the list, i.e. a DWIM split would convert A|B|| into ['A','B', undef, undef, undef ]or I'd even be able to cope with ['A','B','','',''] but in practice perl split is returning only ['A','B'], preventing me from knowing whether the user entered A|B, A|B||| or for that matter A|B||||||||||||||||||||||||||||||||||||

I can see that tuning up the regexp is probably the way to go here, but I don't know how.

Thanks in advance.

-M

Free your mind

Replies are listed 'Best First'.
Re: split problem when emptiness is a valid element
by inman (Curate) on Oct 04, 2005 at 16:44 UTC
    Use the limit argument on split. Specify a negative number to keep all empty records.
    use Data::Dumper; my @array = split /\|/, 'A|B||', -1; print Dumper(\@array);
    gives
    $VAR1 = [ 'A', 'B', '', '' ];
    from the docs for split:
    split /PATTERN/,EXPR,LIMIT

    ...

    If LIMIT is specified and positive, it represents the maximum number of fields the EXPR will be split into, though the actual number of fields returned depends on the number of times PATTERN matches within EXPR. If LIMIT is unspecified or zero, trailing null fields are stripped (which potential users of pop would do well to remember). If LIMIT is negative, it is treated as if an arbitrarily large LIMIT had been specified.

Re: split problem when emptiness is a valid element
by Jenda (Abbot) on Oct 04, 2005 at 16:48 UTC
    perldoc -f split
    ...
    If LIMIT is specified and positive, it represents the maximum number of fields the EXPR will be split into, though the actual number of fields returned depends on the number of times PATTERN matches within EXPR. If LIMIT is unspecified or zero, trailing null fields are stripped (which potential users of "pop" would do well to remember). If LIMIT is negative, it is treated as if an arbitrarily large LIMIT had been specified. Note that splitting an EXPR that evaluates to the empty string always returns the empty list, regardless of the LIMIT specified.

    It's a little confusing but I think these two examples will make it clear:

    $string = "a|b|||"; print "'" . join ("', '", split(/\|/, $string)) . "'\n"; print "'" . join ("', '", split(/\|/, $string, -1)) . "'\n";

    Update: I'm too slow, inman submitten his node sooner than me ;-)

    Jenda
    XML sucks. Badly. SOAP on the other hand is the most powerfull vacuum pump ever invented.

Re: split problem when emptiness is a valid element
by BrowserUk (Patriarch) on Oct 04, 2005 at 16:50 UTC

    If you set the third parameter to split (LIMIT), to -1, then it will produce the trailing undef for you.

    $s = 'A|B||||||||||||||||||||||||||||||||||||';; print join '-', split /\|/, $s;; A-B print join '-', split /\|/, $s, -1;; A-B------------------------------------

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
Re: split problem when emptiness is a valid element
by ikegami (Patriarch) on Oct 04, 2005 at 16:32 UTC

    How about

    @fields = /([^|]*)\|?/g;

    Update: You can even do:

    my ( $f1, $f2, $f3, $f4, @v ) = /([^|]*)\|?/g;

    defined will tell whether a field was provided or not.
    length will tell whether a field was left blank or not.

Re: split problem when emptiness is a valid element
by philcrow (Priest) on Oct 04, 2005 at 16:45 UTC
    You could ask split to give you the delimeters. To do that, wrap your delimiter in parens:
    my @result = split( /(\|)/ );
    Then you get (for example):
    [ 'A', '|', 'B', '|', '|', '|' ]
    This may or may not be easier to handle than other solutions already given.

    Phil

Re: split problem when emptiness is a valid element
by Moron (Curate) on Oct 05, 2005 at 09:14 UTC
    Thanks to all who responded. I have added the -1 to the relevant locations in the code. A quick test showed that split is now behaving itself. Appreciations also to those who attempted a workaround - after all - not all of us have enough of the manual engraved in our brain to know such obscure things as a -1 limit and what you don't know you cannot expect to find! In fact my colleagues had an incredulous laugh about the answer to this problem.

    -M

    Free your mind

Re: split problem when emptiness is a valid element
by blazar (Canon) on Oct 04, 2005 at 16:37 UTC
    This may be a poor workaround, but you could pad the input with '|you_know_what':
    $ perl -le 'print for map "<$_>", split /,/, "a,b,c,,,,"' <a> <b> <c> $ perl -le 'print for map "<$_>", split /,/, "a,b,c,,,,FOO"' <a> <b> <c> <> <> <> <FOO>

    Update: now, of course inman's solution is a superior one and to all effects the right(TM) one, I'd say. I knew about this use of the LIMIT parameter, but for some reason it didn't spring to mind...