http://qs1969.pair.com?node_id=226531

gmax has asked for the wisdom of the Perl Monks concerning the following question:

I am writing a simple script language that needs to parse arguments passed as keyword=value pairs within the same string.
There are a few constraints, which account for some additional complexity:
Example source strings are:
my @all_keywords = (qw(one two three )); my @mandatory_keywords = (qw(two three )); my $source1 = q{ONE="xyz\t" two=a_34 three = 'name="O\'Hara"' }; my $source2 = <<'END'; two=a_34 three = 'name="O\'Hara"' ONE="xyz\t" END
The desired output, from both sources, is a hash containing
my %statement = ( one => 'xyz\t', two => 'a_34', three => q{name="O\'Hara"}, );
In addition, I need to make sure that all the keywords are valid ones, and that the mandatory keywords are defined. Meeting all the requirements is not extremely difficult.
Please have a look at my test code. (The real code is a full-fledged module).
#!/usr/bin/perl -w use strict; my @all_keywords = (qw(one two three four five)); my @mandatory_keywords = (qw(two three four )); my $RE_value = qr/ (\w+) # (1) a keyword \s* = \s* # an equal sign with optional spaces (?: # quoted keyword ... ( # [\'\"\`] # (2) a quoting character ) ( # (3) the quoted value: (?: # either \\\2 # an escaped quote | # or [^\2] # any non-quote character ) +? # repeat (non-greedily) ) \2 # until the initial quote shows up again | (\S+) # (4) ... bare word value ) /x; sub set_value { my ($stat, $kw, $value) = (@_); # case insensitive keyword return 0 unless exists $stat->{lc $kw}; $stat->{lc $kw} = $value; return 1 } sub parse_pairs { my $src = shift; my %statement = map {$_, undef} @all_keywords; for ($src) { while ( ! m/ \G \s* \z /gcx ) { my $result = 0; if ( /\G \s* $RE_value \s* /xgc ) { $result = set_value( \%statement, $1, $4 ? $4 : $3 ); } else { die "syntax error >" . substr($_, pos) ."\n"; } die "invalid keyword $1 \n" unless $result; } } return \%statement; } sub check_pairs { my $statement = shift; for my $kw (@all_keywords) { if (defined $statement->{$kw}) { print "$kw \t -> <$statement->{$kw}>\n" } else { warn "- missing keyword <$kw>!\n" if grep {$kw eq $_} @mandatory_keywords; } } } my @sources = ( q{ ONE="xyz\t" two=a_34 three = 'name="O\'Hara"' four=`'one' two` fi +ve = ah! }, q{ five = ah! ONE="xyz\t" three = 'name="O\'Hara" two=a_34' four=`'one' two` }); for (@sources) { print "\n>>Source: //$_//\n\n"; my $stat = parse_pairs($_); check_pairs($stat); } __END__ output: >>Source: // ONE="xyz\t" two=a_34 three = 'name="O\'Hara"' four=`'one' + two` five = ah! // one -> <xyz\t> two -> <a_34> three -> <name="O\'Hara"> four -> <'one' two> five -> <ah!> >>Source: // five = ah! ONE="xyz\t" three = 'name="O\'Hara" two=a_34' four=`'one' two` // one -> <xyz\t> - missing keyword <two>! three -> <name="O\'Hara" two=a_34> four -> <'one' two> five -> <ah!>
This Regex rightly captures both the barewords and the quoted strings, taking care of embedded quotes and the escaped quote in the name.

Questions:
(1) Could I have achieved the same result using any standard module?
(2) Also, does anyone spot any weakness where the paradigm may break?
So far, it is strong enough to handle correctly sources like
q{one="two=xyz" two=abc} # ^embedded keyword pattern q{one="xyz two=abc three= efg"} # ^missing quotes^
In the first case, the value for two is eaten up by the engine, so it starts examining for a new match after the quoted string, thus rigthly assigning "abc" to two and "two=xyz" to one.
The second case is an input mistake, and the error is found during the check at the end of the loop.
Also, about the preparation work, I had a look at Text::Balanced, which can deal with all the quotes, but it is not clear to me if and how it can also deal with barewords at the same time, and how it could fit in the engine.
TIA
 _  _ _  _  
(_|| | |(_|><
 _|