hossman has asked for the wisdom of the Perl Monks concerning the following question:

This is one of those weird little things that I figured would make more sense to me in the morning ... but it's not. I'm sure I'm overlooking something silly.

I'm splitting a string in such a way that lots of the 'fields' are empty. At first i thought these would come back as empty strings (ie: ''), but evidently I was wrong -- they're undefiend...

bester:~> perl -Mstrict -Mwarnings -le 'print "defined!" if defined( ( +split /\//, "2|3|||||")[4]);' bester:~>

No big deal, I'll just use map to convert the undefs -- or maybe not...

bester:~> perl -Mstrict -Mwarnings -le 'print "defined!" if defined( ( +map { defined($_) ? $_ : ""; } (split /\//, "2|3|||||") )[4]);' bester:~>

This is when i decided it was time to get some sleep, but looking at it in the sunlight, I'm still not seeing my problem.

ideas?

Replies are listed 'Best First'.
Re: split/map weirdness: empty strings vs undef
by sauoq (Abbot) on Oct 04, 2002 at 20:07 UTC

    You aren't splitting on the right character and you should use a third argument to split if you want to preserve any of the trailing empty fields.

    perl -le '$_="2|3|||||"; @a = split /\|/; print scalar @a 2

    The reason that @a only has two elements is explained in the first paragraph of perldoc -f split:

    split Splits a string into a list of strings and returns that list. By default, empty leading fields are preserved, and empty trailing ones are deleted.

    If you know that the line you are splitting will have 7 elements, you can preserve them like this:

    perl -le '$_="2|3|||||"; @a = split /\|/, $_, 7; print scalar @a' 7
    -sauoq
    "My two cents aren't worth a dime.";
    

      Thanks to all... yeah the code I posted was splitting on the wrong character (that was an error in the test case, not in the code I found the problem in). More importantly was the "By default, empty leading fields are preserved, and empty trailing ones are deleted" sentence that I've never noticed before ... I've dealt with records that had empty fields before without problems -- it never occursed to me that trailing blank fields would be treated differently.

      Since I won't allways know how many total fields I should expect (I was planning on using the length of hte array returend by split) I can either count the pipes, or just put an extra bogus field on the end before splitting it, and then pop it off. ... not sure how i feel about that, seems a bit ... dirty.

        You can avoid that behaviour by using a capturing pattern.
        $ perl -le'print join " : ", split /(\|)/, "2|3|||||"' 2 : | : 3 : | : : | : : | : : | : : |
        This is also documented in perldoc -f split. Of course now every other element is the separator, which isn't what we wanted. So out with them:
        $ perl -le'print join " : ", grep ++$i%2, split /(\|)/, "2|3|||||"' 2 : 3 : : : :
        There you go.

        Update: Yikes! I caught a mistake: the last field still disappears if empty. Count the pipes in the string and the colons in the output.. At this point we lose some grace.. The oneliner is ugly:

        $ perl -le'print(join(" : ", grep(++$i%2, split /(\|)/, "2|3|||||"), ( +"")x(1-$i%2)))' 2 : 3 : : : : :
        while the multiliner is awkward:
        $ perl -le'my @field = grep ++$i%2, split /(\|)/, "2|3|||||"; push @fi +eld, "" unless $i%2; print @field' 2 : 3 : : : : :
        Oh well.

        Makeshifts last the longest.

        split can still be used - just specify a negative number for the number of fields. From split documentation:

        If LIMIT is specified and positive, splits into no more than that many fields (though it may split into fewer). If LIMIT is unspecified or zero, trailing null fields are stripped (which potential users of "pop" would do well to remember). If LIMIT is negative, it is treated as if an arbitrarily large LIMIT had been specified.

        An example (with quoting changes so it will work on Win2K),

        C:\>perl -Mstrict -Mwarnings -le "print 'defined!' if defined( (split +/\|/, '2|3|||||', -1)[4]);" defined!

        hth...

        After all of the bugs, updates, and general messiness caused by this question, I started wondering if using split was really the right way to go. It can be done with a regex:

        my @tests = ( '', qw( 2|3||||| 2 2| 2|3 2|3| | |2 |2| |2|3 |2|3| )); for my $string (@tests) { # Splits on '|' and preserves empty fields... my @fields = $string =~ m/((?:^|(?<=\|))[^|]*)/g; printf "%-10s :", "'$string'"; print "($_)" for @fields; print "\n"; } __END__ '' :() '2|3|||||' :(2)(3)()()()()() '2' :(2) '2|' :(2)() '2|3' :(2)(3) '2|3|' :(2)(3)() '|' :()() '|2' :()(2) '|2|' :()(2)() '|2|3' :()(2)(3) '|2|3|' :()(2)(3)()

        However, constructing that regex wasn't particularly easy. I think, had I been faced with this task myself, I would have opted for the method you suggested. That is, append a non-empty field and then pop it off of the array after you split. At first it seems kind of a dirty trick but once you find how difficult it is to do it otherwise,

        my @fields = split /\|/, $string.'|x'; pop @fields;
        starts looking more and more elegant. I imagine it would score well on efficiency too.

        -sauoq
        "My two cents aren't worth a dime.";
        
Re: split/map weirdness: empty strings vs undef
by dws (Chancellor) on Oct 04, 2002 at 19:36 UTC
    One problem is that you're splitting on '\' when the delimeter is '|'. This results in a single-element array, holding the complete string you tried to split. When you apply a subscript of 4, you get an undefined value.

    Try this:

    map { print defined($_) ? "def\n" : "undef\n" } split(/|/, "2|3|||||");

    Bug! That should be split(/\|/, ...) See below.

      You'll need to escape the pipe, though, because its a special char (alternation) in the regex...
      split(/\|/, "2|3|||||")

      -Blake

        You'll need to escape the pipe, though, because its a special char (alternation) in the regex...

        Quite so. And note that   split(/\|/, "2|3||||"); gives a different result than   split('|', "2|3||||"); Consult perlfunc for details. You probably want to be using the latter form, as it retains empty fields.

      Almost. As blakem said, you have to escape the pipe. Maybe more importantly though, the trailing empty fields will disappear in a puff of smoke.

      perl -le 'map { print defined($_) ? "def" : "undef" } split(/\|/, "2|3 +|||||"); def def
      -sauoq
      "My two cents aren't worth a dime.";