ropey has asked for the wisdom of the Perl Monks concerning the following question:

I'm ashamed to POST this question.... in my defense I still have the hangover from hell..

Anyway I need to split a string on :, however sometimes the string will contain an escaped : like \: in which case this would not be split on - does that make sense

So if i had

$string = '111111111:22222\:2222:333333333:4444444';

And I split it I would have 111111, 222222, 333333, 444444 etc...

note to self - stop drinking when got to work...

Replies are listed 'Best First'.
Re: Easy Split
by ikegami (Patriarch) on Jul 03, 2009 at 15:28 UTC
    Text::CSV/Text::CSV_XS would do the trick as well.
    my $csv = Text::CSV_XS->new ({ binary => 1, sep_char => ':' }); while (my $row = $csv->getline(*ARGV)) { my @fields = @$row; ... } die($csv->error_diag, "\n") if !$csv->eof;
      ok - so does that mean there is no solution just with split?
        Hi ropey,

        If I'm understanding your requirements correctly, you could use a negative-lookbehind assertion like this:

        use strict; use warnings; my $string = '111111111:22222\:2222:333333333:4444444'; my @split = split(/(?<!\\):/, $string); print "Results:\n"; map { print "$_\n" } @split;

        Which produces this:

        Results: 111111111 22222\:2222 333333333 4444444
        You'd still need to handle the escaped '\', but the above code at least doesn't split on ':' if it's preceded by '\'.  Of course, that doesn't take into account the situation where a backslash '\' is really the second of a pair of backslashes, as in:
        my $string = 'AAA:BBB:CCC\\:CCC:DDD";

        ... so it's really only a simplistic, partial solution.


        s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/
        It means that $csv->getline is much simpler, much more readable and much more maintainable than any solution involving split, so I didn't see the point of coming up with a solution involving split.
        split /[^\\](?:\\\\)*\K:/, $string worked Ok for me: it splits on an even number of inverted bars, followed by a colon, but keeps the bars in the split strings.
        sub x($) { local($_) = @_; say "\n$_ ==>"; say for split /[^\\](?:\\\\)*\K:/, $_ } x $_ for <DATA>; __DATA__ 111111111:22222\:2222:333333333:4444444 111111111:22222\\:2222:333333333:4444444 111111111:22222\\\:2222:333333333:4444444 111111111:22222\\\\:2222:333333333:4444444
        Result:
        111111111:22222\:2222:333333333:4444444 ==> 111111111 22222\:2222 333333333 4444444 111111111:22222\\:2222:333333333:4444444 ==> 111111111 22222\\ 2222 333333333 4444444 111111111:22222\\\:2222:333333333:4444444 ==> 111111111 22222\\\:2222 333333333 4444444 111111111:22222\\\\:2222:333333333:4444444 ==> 111111111 22222\\\\ 2222 333333333 4444444
        Beware that '222\\:222' only has ONE inverted bar... :-D
        []s, HTH, Massa (κς,πμ,πλ)
Re: Easy Split
by Anonymous Monk on Jul 03, 2009 at 14:41 UTC
Re: Easy Split
by tfoertsch (Beadle) on Jul 03, 2009 at 16:07 UTC
    $ perl -le 'print join "\n", map {s/\\://g;$_} "aa:b\\:b\\::\\:c\\:\\: +c"=~m!(?>\\.|[^:])+!gs' aa bb cc

    $ perl -le 'print join "\n", map {s/\\://g;$_} "111111111:22222\\:2222 +:333333333:4444444"=~m!(?>\\.|[^:])+!gs' 111111111 222222222 333333333 4444444

    Torsten

      Why are you removing the colons?
      map {s/\\://g;$_}
      should be
      map { (my $s = $_) =~ s/\\(.)/$1/sg; $s }

      Update: hum... The OP did show a lack of colons in the desired output. If that's truly what he wants,

      map {s/\\://g;$_}
      should be
      map { my $s = $_; $s =~ s/\\(.)/$1/sg; $s =~ s/://g; $s }
Re: Easy Split
by rovf (Priest) on Jul 03, 2009 at 15:56 UTC

    First split on the pattern /[^\\]:/. This gives you the correct grouping, but leaves you with spurious \:. You can use map and s/// to delete them too, i.e.

    map { s/\\://g } split(...)
    For completeness: This would fail if a colon can be the very first character in your string.

    -- 
    Ronald Fischer <ynnor@mm.st>
Re: Easy Split
by ww (Archbishop) on Jul 03, 2009 at 20:20 UTC
    No matter how one reads the totality of your node, there's some sort of disconnect between "an escaped : like \: in which case this would not be split on" and the desired output you show.

    And the mismatch in the number of digits (for example, nine "1"s in $string and only 6 in the desired output (that is what the para after the code is, isn't it?) makes it doubly hard to find a single meaning to "would not be split on"

    • do you want to keep the \:
    • to lose it but keep the second set of "2"s
    • to lose both (if so, why six "2"s in the output and only 5 in $string)?

    And, just BTW, the colon is NOTescaped in the $string you show; since it's single-quoted, you're showing us a literal backslash followed by a colon.

    Perhaps you can clarify.

Re: Easy Split
by Marshall (Canon) on Jul 06, 2009 at 15:51 UTC
    Wow! I guess this wasn't that "easy". It is tricky.
    #!/usr/bin/perl -w use strict; my $string = '111111111:22222\:2222:333333333:4444444'; $string =~ s/\\:/:/; #translate \: to just : #the "trick" and its all over now... chomp ($string); #don't worry about trailing \n my @strings = split (/:/,$string); foreach my $line (@strings) { print "$line\n"; } __END__ prints: 111111111 22222 2222 333333333 4444444
    Update: the above code produces exactly what the OP asked for, but it could be that 22222\:2222 is a typo and that this "2" string doesn't include a backslash. If so then just split on [:\n] or something like that.
      Trickier than it seems :-D Apparently, the OP wants NOT to split on '\:'; so the answer should be:
      111111111 22222\:2222 333333333 4444444
      []s, HTH, Massa (κς,πμ,πλ)
        I think that we have all done a great job of presenting various alternatives. Hooray! I think it is now up to the OP to say what he thinks about our suggestions. I tried. You tried. A bunch of folks have tried. It is pointless for us to argue about what the OP wanted. Let's see what develops. One of the possible scenarios is that the OP says thanks to everybody. We have all tried to help.