mikecarlton has asked for the wisdom of the Perl Monks concerning the following question:

I want to split a line into field and value, separated by the first colon on the line, i.e.
my ($field, $value) = $_ =~ /^(.*?):(.*)$/
which turns "a:b" into ('a', 'b').

This works fine, but I also want to allow backslash escapes, in particular '\:' should not be a field delimiter, e.g. splitting "a\:b:c" should return ('a:b', 'c').

I believe the logic should be "match on a colon not preceeded by an odd number of consecutive backslashes" (doubled backslashes are a literal backslash). My logic is to match pairs of backslashes greedily, then a : not following a backslash, i.e.

my ($field, $value) = $_ =~ /^(.*(?:\\\\)*)(?<!\\):(.*)$/

But this and many other variations don't work. Suggestions?

Replies are listed 'Best First'.
Re: Handling escapes while splitting lines
by tye (Sage) on Jan 19, 2005 at 02:41 UTC
    my( $field, $value )= /((?:[^:\\]+|\\.)*):(.*)/s;

    - tye        

      Aha -- that does it. I like the approach much better (matching non-backslash and colon or 2-character escape sequences).

      Running some quick test code:

      my $re = shift or die "re expected\n"; while (<>) { chomp; my ($field, $value) = ($_ =~ $re); print "'$_' -> ('$field', '$value')\n"; }
      We see that it handles all these cases correctly:
      'a:b:c' -> ('a', 'b:c') 'a\:b:c' -> ('a\:b', 'c') 'a\\:b:c' -> ('a\\', 'b:c') 'a\\\:b:c' -> ('a\\\:b', 'c') 'a:' -> ('a', '') ':b' -> ('', 'b')
      The only thing it doesn't handle perfectly is the invalid input case with an escaped colon but no delimiter:
      'a\:b' -> ('', 'b')
      Notice that this is indistinguishable from the ':b' input. The fix is simple though, just anchor the start of the match:
      ^((?:[^:\\]+|\\.)*):(.*)
      For my use this works better, as the match fails, telling me there was no field delimiter present.

      Thanks
      --mike

Re: Handling escapes while splitting lines
by Limbic~Region (Chancellor) on Jan 19, 2005 at 01:44 UTC
    mikecarlton,
    I tend to use split when I am trying to split.
    my $string = 'a\:b:c'; my ($key, $val) = split /(?<!\\):/, $string, 2; print "$key, $val\n";
    Of course you then need to remove the escape characters from the resulting strings, but I left that as an excersise for the reader.

    Cheers - L~R

      I don't think that will work correctly if the last character of the key is an escaped backslash. That is,
      my $string = 'a\\\\:b:c';
      The key should be 'a\' and the value 'b:c' (after removing escapes), but you'll end up with 'a\:b' and 'c'.
        Eimi Metamorphoumai,
        Right you are. This is one of those confessions I need to make. It frustrates me when people respond to my post without having read it closely enough to see that their reply is not applicable and yet I have done the same thing here. I completely missed the part about a double backslash being a literal backslash. I was thinking that the only thing that could be escaped was the colon. Thanks for correcting me and reminding me the importance of reading carefully.

        Cheers - L~R

Re: Handling escapes while splitting lines
by demerphq (Chancellor) on Jan 19, 2005 at 09:00 UTC
    perl -le "$str='a\\:b:c'; my (undef,$x,$y)=split /((?:[^:\\]+|\\.)*):/ +s,$str,2; print for $x,$y" a\:b c
    perl -le "$str='a\\:b:c'; my ($x,$y)=$str=~/^((?:[^:\\]+|\\.)*):(.*)$/ +s; print for $x,$y;" a\:b c
    ---
    demerphq