split problem when emptiness is a valid element

Moron has asked for the wisdom of the Perl Monks concerning the following question:

A user supplies a file which has a header and a number of rows. The header takes the form:

FA1 '|' FA2 '|' FA3 '|' FA4 [ '|' VA1 .. ]
[download]

where FAn are always-required attribute ids and VAn are zero or more optional attribute ids of no particular limit in number.

The rows thereafter are values for insert or update to the database.

The problem is that when using the split command:

my ( $f1, $f2, $f3, $f4, @v ) = split( /\|/ );
[download]

(under perl version 5.6.1 if that matters) any trailing empty fields are missed from the list, i.e. a DWIM split would convert A|B|| into ['A','B', undef, undef, undef ]or I'd even be able to cope with ['A','B','','',''] but in practice perl split is returning only ['A','B'], preventing me from knowing whether the user entered A|B, A|B||| or for that matter A|B||||||||||||||||||||||||||||||||||||

I can see that tuning up the regexp is probably the way to go here, but I don't know how.

Thanks in advance.

-M

Free your mind

Comment on split problem when emptiness is a valid element Select or Download Code

Replies are listed 'Best First'.

Re: split problem when emptiness is a valid element
by inman (Curate) on Oct 04, 2005 at 16:44 UTC

use Data::Dumper;

my @array = split /\|/, 'A|B||', -1;

print Dumper(\@array);
[download]

$VAR1 = [
          'A',
          'B',
          '',
          ''
        ];
[download]

split /PATTERN/,EXPR,LIMIT
...
If LIMIT is specified and positive, it represents the maximum number of fields the EXPR will be split into, though the actual number of fields returned depends on the number of times PATTERN matches within EXPR. If LIMIT is unspecified or zero, trailing null fields are stripped (which potential users of pop would do well to remember). If LIMIT is negative, it is treated as if an arbitrarily large LIMIT had been specified.

[reply]
[d/l]
[select]

Re: split problem when emptiness is a valid element
by Jenda (Abbot) on Oct 04, 2005 at 16:48 UTC

perldoc -f split
...
If LIMIT is specified and positive, it represents the maximum number of fields the EXPR will be split into, though the actual number of fields returned depends on the number of times PATTERN matches within EXPR. If LIMIT is unspecified or zero, trailing null fields are stripped (which potential users of "pop" would do well to remember). If LIMIT is negative, it is treated as if an arbitrarily large LIMIT had been specified. Note that splitting an EXPR that evaluates to the empty string always returns the empty list, regardless of the LIMIT specified.

It's a little confusing but I think these two examples will make it clear:

$string = "a|b|||";
print "'" . join ("', '", split(/\|/, $string)) . "'\n";
print "'" . join ("', '", split(/\|/, $string, -1)) . "'\n";
[download]

Update: I'm too slow, inman submitten his node sooner than me ;-)

Jenda
XML sucks. Badly. SOAP on the other hand is the most powerfull vacuum pump ever invented.

[reply]
[d/l]

Re: split problem when emptiness is a valid element
by BrowserUk (Patriarch) on Oct 04, 2005 at 16:50 UTC

If you set the third parameter to split (LIMIT), to -1, then it will produce the trailing undef for you.

$s = 'A|B||||||||||||||||||||||||||||||||||||';;
print join '-', split /\|/, $s;;
A-B

print join '-', split /\|/, $s, -1;;
A-B------------------------------------
[download]

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?

"Science is about questioning the status quo. Questioning authority".

The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.

[reply]
[d/l]

Re: split problem when emptiness is a valid element
by ikegami (Patriarch) on Oct 04, 2005 at 16:32 UTC

How about

@fields = /([^|]*)\|?/g;
[download]

Update: You can even do:

my ( $f1, $f2, $f3, $f4, @v ) = /([^|]*)\|?/g;
[download]

defined will tell whether a field was provided or not.
length will tell whether a field was left blank or not.

[reply]
[d/l]
[select]

Re: split problem when emptiness is a valid element
by philcrow (Priest) on Oct 04, 2005 at 16:45 UTC

my @result = split( /(\|)/ );
[download]

[ 'A', '|', 'B', '|', '|', '|' ]
[download]

Phil

[reply]
[d/l]
[select]

Re: split problem when emptiness is a valid element
by Moron (Curate) on Oct 05, 2005 at 09:14 UTC

-M

Free your mind

[reply]

Re: split problem when emptiness is a valid element
by blazar (Canon) on Oct 04, 2005 at 16:37 UTC

'|you_know_what'

$ perl -le 'print for map "<$_>", split /,/, "a,b,c,,,,"'
<a>
<b>
<c>
$ perl -le 'print for map "<$_>", split /,/, "a,b,c,,,,FOO"'
<a>
<b>
<c>
<>
<>
<>
<FOO>
[download]

Update: now, of course inman's solution is a superior one and to all effects the right(TM) one, I'd say. I knew about this use of the LIMIT parameter, but for some reason it didn't spring to mind...

[reply]
[d/l]
[select]