in reply to Re: split on delimiter unless escaped
in thread split on delimiter unless escaped

Not quite. "+" means you can't have empty fields. And if you change it "*", you can get one too many empty fields. That's why my solution is slightly different.
  • Comment on Re^2: split on delimiter unless escaped

Replies are listed 'Best First'.
Re^3: split on delimiter unless escaped
by yrp001 (Initiate) on Nov 10, 2010 at 00:19 UTC

    Hi ikegami,

    Thanks for your example. I'm still trying to figure it all out. I'm running it as below, and it doesn't seem to quite do what I want. I only want the escape character to be treated specially if it's in !+; - i.e. a!!b should be a!!b, whereas a!!!;b should be a!;b.

    Also, I seem to be getting an empty field at the end. One or more semicolons at the end seem to be parsed properly, though.

    One test string returns a blank result. ?

    sub dequote { my $x = $_[0]; $x =~ s/!(.)/$1/sg; return $x; } while(<>) { chomp; my @fields = map dequote($_), /\G((?:[^!;]+|!.)*)(?:;|\z)/sg; print "$_ => " . join( '|', @fields ) . "\n"; # print "$_ => @fields\n"; }

    Sample results:

    aval!!!!;bval => aval!!|bval| aval!!!!!;bval => aval!!;bval| a!!val!!!!!;bval! => !a!!!val!!!!!;bval!! => a!val!!;bval!| a!val!;bva!l; => aval;bval| a!!val!!;;bv!!al;; => a!val!||bv!al||

      I only want the escape character to be treated specially if it's in !+;

      Yuck! I hope you're being forced to deal with this format.

      It's not only tricker for a human to understand, it's tricker to code. In particular, the definition of a field varies based on whether it's the last field or not, and the function of the "!" varies based on its position in the field.

      sub unescape { my $x = $_[0]; my ($base, $end) = $x =~ /^(.*)(!+)\z/s; return $base . ('!' x (length($end)/2)); } my $last_field = qr/ [^;]* /x; my $other_field = qr/ (?: [^!]+ | (?: ![^!] )+ )* (?:!!)* /x; # Validation my $record = qr/^ (?: $other_field ; )* $last_field \z/x; # Extraction my @fields = map unescape($_), / \G ( $other_field (?= ; ) | $last_field (?= \z ) ) (?:;|\z) /xg;

      You are free to skip the validation.