in reply to Re: split/map weirdness: empty strings vs undef
in thread split/map weirdness: empty strings vs undef

Thanks to all... yeah the code I posted was splitting on the wrong character (that was an error in the test case, not in the code I found the problem in). More importantly was the "By default, empty leading fields are preserved, and empty trailing ones are deleted" sentence that I've never noticed before ... I've dealt with records that had empty fields before without problems -- it never occursed to me that trailing blank fields would be treated differently.

Since I won't allways know how many total fields I should expect (I was planning on using the length of hte array returend by split) I can either count the pipes, or just put an extra bogus field on the end before splitting it, and then pop it off. ... not sure how i feel about that, seems a bit ... dirty.

  • Comment on Re: Re: split/map weirdness: empty strings vs undef

Replies are listed 'Best First'.
Re^3: split/map weirdness: empty strings vs undef
by Aristotle (Chancellor) on Oct 04, 2002 at 22:26 UTC
    You can avoid that behaviour by using a capturing pattern.
    $ perl -le'print join " : ", split /(\|)/, "2|3|||||"' 2 : | : 3 : | : : | : : | : : | : : |
    This is also documented in perldoc -f split. Of course now every other element is the separator, which isn't what we wanted. So out with them:
    $ perl -le'print join " : ", grep ++$i%2, split /(\|)/, "2|3|||||"' 2 : 3 : : : :
    There you go.

    Update: Yikes! I caught a mistake: the last field still disappears if empty. Count the pipes in the string and the colons in the output.. At this point we lose some grace.. The oneliner is ugly:

    $ perl -le'print(join(" : ", grep(++$i%2, split /(\|)/, "2|3|||||"), ( +"")x(1-$i%2)))' 2 : 3 : : : : :
    while the multiliner is awkward:
    $ perl -le'my @field = grep ++$i%2, split /(\|)/, "2|3|||||"; push @fi +eld, "" unless $i%2; print @field' 2 : 3 : : : : :
    Oh well.

    Makeshifts last the longest.

      That is damn elegant.

      UPDATE: After reading your update, i was a little dissapointed that i wouldn't be able to use your clean little trick, untill I looked at my data some more. Unlike ever other "seperated" data format I've ever seen, it allways puts a trailing seperator after the last field -- which means your orriginal one liner will work perfect for me ...I need to ignore the last pipe

      go figure

        You know, that gives me another idea for the cases you need to conserve the last field, although it is so similar to the add a bogus field approach that it bears the question whether the capture/grep hoopla is worth the hassle if you went that route anyway: add a field separator to the string.
        $ perl -le'$_ = "2|3|||||"; print join " : ", grep ++$i%2, split /(\|) +/, "$_|"' 2 : 3 : : : : :

        Makeshifts last the longest.

Re: Re: Re: split/map weirdness: empty strings vs undef
by jsprat (Curate) on Oct 08, 2002 at 17:44 UTC
    split can still be used - just specify a negative number for the number of fields. From split documentation:

    If LIMIT is specified and positive, splits into no more than that many fields (though it may split into fewer). If LIMIT is unspecified or zero, trailing null fields are stripped (which potential users of "pop" would do well to remember). If LIMIT is negative, it is treated as if an arbitrarily large LIMIT had been specified.

    An example (with quoting changes so it will work on Win2K),

    C:\>perl -Mstrict -Mwarnings -le "print 'defined!' if defined( (split +/\|/, '2|3|||||', -1)[4]);" defined!

    hth...

Re: Re: Re: split/map weirdness: empty strings vs undef
by sauoq (Abbot) on Oct 05, 2002 at 01:32 UTC

    After all of the bugs, updates, and general messiness caused by this question, I started wondering if using split was really the right way to go. It can be done with a regex:

    my @tests = ( '', qw( 2|3||||| 2 2| 2|3 2|3| | |2 |2| |2|3 |2|3| )); for my $string (@tests) { # Splits on '|' and preserves empty fields... my @fields = $string =~ m/((?:^|(?<=\|))[^|]*)/g; printf "%-10s :", "'$string'"; print "($_)" for @fields; print "\n"; } __END__ '' :() '2|3|||||' :(2)(3)()()()()() '2' :(2) '2|' :(2)() '2|3' :(2)(3) '2|3|' :(2)(3)() '|' :()() '|2' :()(2) '|2|' :()(2)() '|2|3' :()(2)(3) '|2|3|' :()(2)(3)()

    However, constructing that regex wasn't particularly easy. I think, had I been faced with this task myself, I would have opted for the method you suggested. That is, append a non-empty field and then pop it off of the array after you split. At first it seems kind of a dirty trick but once you find how difficult it is to do it otherwise,

    my @fields = split /\|/, $string.'|x'; pop @fields;
    starts looking more and more elegant. I imagine it would score well on efficiency too.

    -sauoq
    "My two cents aren't worth a dime.";
    
      Argh... I thought I could do better, but to get your exact results, all I could come up with was:
      my @fields = $string =~ m/([^|]*)\|?/g; $string && $string !~ /\|$/ && pop @fields;
      That second line can get a bit simpler if you define an empty string to have zero fields instead of having a single null field.

      Congrats, your solution is better than mine even though it looks unnecessarily complicated.

      I think the best suggestion so far is to tell split how many fields you want:

      my @fields = split(/\|/,$string,$string =~ tr/|/|/ + 1);
      Though I'd probably break it up into two lines:
      my $fieldcount = $string =~ tr/|/|/ + 1; my @fields = split(/\|/, $string, $fieldcount);

      -Blake