in reply to Re^2: split on delimiter unless escaped
in thread split on delimiter unless escaped

Hi ikegami,

Thanks for your example. I'm still trying to figure it all out. I'm running it as below, and it doesn't seem to quite do what I want. I only want the escape character to be treated specially if it's in !+; - i.e. a!!b should be a!!b, whereas a!!!;b should be a!;b.

Also, I seem to be getting an empty field at the end. One or more semicolons at the end seem to be parsed properly, though.

One test string returns a blank result. ?

sub dequote { my $x = $_[0]; $x =~ s/!(.)/$1/sg; return $x; } while(<>) { chomp; my @fields = map dequote($_), /\G((?:[^!;]+|!.)*)(?:;|\z)/sg; print "$_ => " . join( '|', @fields ) . "\n"; # print "$_ => @fields\n"; }

Sample results:

aval!!!!;bval => aval!!|bval| aval!!!!!;bval => aval!!;bval| a!!val!!!!!;bval! => !a!!!val!!!!!;bval!! => a!val!!;bval!| a!val!;bva!l; => aval;bval| a!!val!!;;bv!!al;; => a!val!||bv!al||

Replies are listed 'Best First'.
Re^4: split on delimiter unless escaped
by ikegami (Patriarch) on Nov 10, 2010 at 01:10 UTC

    I only want the escape character to be treated specially if it's in !+;

    Yuck! I hope you're being forced to deal with this format.

    It's not only tricker for a human to understand, it's tricker to code. In particular, the definition of a field varies based on whether it's the last field or not, and the function of the "!" varies based on its position in the field.

    sub unescape { my $x = $_[0]; my ($base, $end) = $x =~ /^(.*)(!+)\z/s; return $base . ('!' x (length($end)/2)); } my $last_field = qr/ [^;]* /x; my $other_field = qr/ (?: [^!]+ | (?: ![^!] )+ )* (?:!!)* /x; # Validation my $record = qr/^ (?: $other_field ; )* $last_field \z/x; # Extraction my @fields = map unescape($_), / \G ( $other_field (?= ; ) | $last_field (?= \z ) ) (?:;|\z) /xg;

    You are free to skip the validation.