mtovey has asked for the wisdom of the Perl Monks concerning the following question:

So, this is yet another string parsing question that has probably been asked before, but I am so far unable to find a solution.

I have a string that looks like the following:

"name1=value1 name2=' value2=0' name3=value3"

I need a loop that will break each pair into two scalars, $name and $value, and process them. The killer for me is the "name2=' value2'" pair. I need that leading space for value2 to be included into $value.

So far I am ending up with $name set to "name2" and $value set to "'" during one iteration of the loop, and $name set to "vlaue2" and $value set to "0'" during the next iteration. What I am expecting is $name set to "name2" and $value set to " value2=0" during one iteration.

I assume that a regular expression can be written for this, but my perlre is pretty thin right now. Any help will be greatly appreciated!

-Mark

Replies are listed 'Best First'.
Re: parse string containing space
by AnomalousMonk (Archbishop) on May 01, 2015 at 04:37 UTC

    One possible way:

    c:\@Work\Perl>perl -wMstrict -MData::Dump -le "my $s = q{name1=value1 name2=' value2=0' name3=value3}; ;; my $name = qr{ \w+ }xms; my $plain = qr{ \w+ }xms; my $s_quoted = qr{ ' [^\x27]* ' }xms; ;; my %h = $s =~ m{ ($name) \s* = \s* ($s_quoted | $plain) \s* }xmsg; dd \%h; " { name1 => "value1", name2 => "' value2=0'", name3 => "value3" }
    Please see perlre, perlrequick, and perlretut.

    Updates:

    1. Note that  \x27 in  [^\x27] represents a  ' (single-quote) character. I have to use this form because my REPL does not like unbalanced single-quotes in a command-line code expression. You can use  [^'] like a sane person.
    2. Another and probably better approach would be to forget regexes and use Text::CSV or Text::CSV_XS.
    3. If you need to get rid of the single-quotes and have Perl version 5.10+, try this. Note that  $s_quoted is changed and the  m// match uses  (?|pattern) from Extended Patterns.
      c:\@Work\Perl>perl -wMstrict -MData::Dump -le "use 5.010; ;; my $s = q{name1=value1 name2=' value2=0' name3=value3}; ;; my $name = qr{ \w+ }xms; my $plain = qr{ \w+ }xms; my $s_quoted = qr{ [^\x27]* }xms; ;; my %h = $s =~ m{ ($name) \s* = \s* (?| ' ($s_quoted) ' | ($plain)) \s +* }xmsg; dd \%h; " { name1 => "value1", name2 => " value2=0", name3 => "value3" }
      If you don't have 5.10, let me know. There's a simple alternative.


    Give a man a fish:  <%-(-(-(-<

Re: parse string containing space
by choroba (Cardinal) on May 01, 2015 at 13:37 UTC
    Build a parser. Marpa::R2 can help you with it:
    #!/usr/bin/perl use warnings; use strict; use Marpa::R2; use Data::Dumper; my $input = q(name1=value1 name2=' value2=0' name3=value3); my $dsl = << '__DSL__'; lexeme default = latm => 1 :default ::= action => first List ::= Pair action => single | Pair white List action => store Pair ::= Key '=' Value action => pair Key ::= noneq Value ::= word | Quoted Quoted ::= quote string quote action => second noneq ~ [^=]+ word ~ [^\s]+ string ~ [^']+ white ~ [\s]+ quote ~ ['] __DSL__ sub first { $_[1] } sub second { $_[2] } sub single { [ $_[1] ] } sub pair { +{ $_[1] => $_[3] } } sub store { [ $_[1], @{ $_[3] } ] } my $grammar = 'Marpa::R2::Scanless::G'->new( { source => \$dsl } ); my $value_ref = $grammar->parse( \$input, 'main' ); print Dumper $value_ref;

    Output:

    $VAR1 = \[ { 'name1' => 'value1' }, { 'name2' => ' value2=0' }, { 'name3' => 'value3' } ];
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      Two good possibilities! I will try to test this weekend and see what shakes out.

      Thanks!