in reply to Re: Help with regex for complicated key=value string
in thread Help with regex for complicated key=value string

Thanks ikegami, using \G is a good idea. Thanks for all the effort.

I also coded an simple parser in the meantime, which seems to do the job for me but without doing strict error-checking like yours. It also only splits the string on the commas, not yet the keys and values. Because only alphanumerical keys are allowed anyway (I didn't mentioned that before, sorry) there is no need to look for escaped '=', etc., i.e. everything which doesn't start with /\s*[\w_]+=/ is taken as a single key-less value.

#!/usr/bin/perl use strict; use warnings; sub cskwparser { my $string = shift; my $esc = 0; my $quote = 0; my @args; my $narg = 0; CHAR: foreach my $char ( split //, $string ) { if ($esc) { $esc = 0; } elsif ( $char eq '"' ) { $quote = !$quote; } elsif ( $char eq '\\' ) { $esc = 1; } elsif ( $char eq ',' and not $quote ) { $narg++; next CHAR; } $args[$narg] .= $char; } return @args; } # Test loop: local $, = '|'; while (<>) { print cskwparser $_; }

Replies are listed 'Best First'.
Re^3: Help with regex for complicated key=value string
by ikegami (Patriarch) on Oct 29, 2008 at 19:36 UTC

    $esc is never used in your code, so that means "\\," is not handled yet.

    but without doing strict error-checking like yours

    It's not really. It only checks for 'key=,...' (key with no value) and 'v"alu"e' (invalid quoting).

    Because only alphanumerical keys are allowed anyway (I didn't mentioned that before, sorry) there is no need to look for escaped '=', etc., i.e. everything which doesn't start with /\s*[\w_]+=/ is taken as a single key-less value.

    That's the same strategy I used, but I didn't limit to alphanum chars. To do so, change
    /\G ( [^=,]+ ) = /xgc && ( $key = $1 );
    to
    /\G ( [a-zA-Z0-9]+ ) = /xgc && ( $key = $1 );

      $esc is used:
      if ($esc) { $esc = 0; } ... elsif ( $char eq '\\' ) { $esc = 1; }
      and "\\," is handled quite well in my tests.

      Do I miss here something?

        Nevermind. I wrongly thought
        if ($esc) { $esc = 0; } ... elsif ( $char eq '\\' ) { $esc = 1; }
        was identical to
        ... elsif ( $char eq '\\' ) { }