gmax has asked for the wisdom of the Perl Monks concerning the following question:
The desired output, from both sources, is a hash containingmy @all_keywords = (qw(one two three )); my @mandatory_keywords = (qw(two three )); my $source1 = q{ONE="xyz\t" two=a_34 three = 'name="O\'Hara"' }; my $source2 = <<'END'; two=a_34 three = 'name="O\'Hara"' ONE="xyz\t" END
In addition, I need to make sure that all the keywords are valid ones, and that the mandatory keywords are defined. Meeting all the requirements is not extremely difficult.my %statement = ( one => 'xyz\t', two => 'a_34', three => q{name="O\'Hara"}, );
This Regex rightly captures both the barewords and the quoted strings, taking care of embedded quotes and the escaped quote in the name.#!/usr/bin/perl -w use strict; my @all_keywords = (qw(one two three four five)); my @mandatory_keywords = (qw(two three four )); my $RE_value = qr/ (\w+) # (1) a keyword \s* = \s* # an equal sign with optional spaces (?: # quoted keyword ... ( # [\'\"\`] # (2) a quoting character ) ( # (3) the quoted value: (?: # either \\\2 # an escaped quote | # or [^\2] # any non-quote character ) +? # repeat (non-greedily) ) \2 # until the initial quote shows up again | (\S+) # (4) ... bare word value ) /x; sub set_value { my ($stat, $kw, $value) = (@_); # case insensitive keyword return 0 unless exists $stat->{lc $kw}; $stat->{lc $kw} = $value; return 1 } sub parse_pairs { my $src = shift; my %statement = map {$_, undef} @all_keywords; for ($src) { while ( ! m/ \G \s* \z /gcx ) { my $result = 0; if ( /\G \s* $RE_value \s* /xgc ) { $result = set_value( \%statement, $1, $4 ? $4 : $3 ); } else { die "syntax error >" . substr($_, pos) ."\n"; } die "invalid keyword $1 \n" unless $result; } } return \%statement; } sub check_pairs { my $statement = shift; for my $kw (@all_keywords) { if (defined $statement->{$kw}) { print "$kw \t -> <$statement->{$kw}>\n" } else { warn "- missing keyword <$kw>!\n" if grep {$kw eq $_} @mandatory_keywords; } } } my @sources = ( q{ ONE="xyz\t" two=a_34 three = 'name="O\'Hara"' four=`'one' two` fi +ve = ah! }, q{ five = ah! ONE="xyz\t" three = 'name="O\'Hara" two=a_34' four=`'one' two` }); for (@sources) { print "\n>>Source: //$_//\n\n"; my $stat = parse_pairs($_); check_pairs($stat); } __END__ output: >>Source: // ONE="xyz\t" two=a_34 three = 'name="O\'Hara"' four=`'one' + two` five = ah! // one -> <xyz\t> two -> <a_34> three -> <name="O\'Hara"> four -> <'one' two> five -> <ah!> >>Source: // five = ah! ONE="xyz\t" three = 'name="O\'Hara" two=a_34' four=`'one' two` // one -> <xyz\t> - missing keyword <two>! three -> <name="O\'Hara" two=a_34> four -> <'one' two> five -> <ah!>
In the first case, the value for two is eaten up by the engine, so it starts examining for a new match after the quoted string, thus rigthly assigning "abc" to two and "two=xyz" to one.q{one="two=xyz" two=abc} # ^embedded keyword pattern q{one="xyz two=abc three= efg"} # ^missing quotes^
_ _ _ _ (_|| | |(_|>< _|
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Regex capturing either quoted strings or bare words (final backslash)
by tye (Sage) on Jan 13, 2003 at 18:53 UTC | |
by gmax (Abbot) on Jan 13, 2003 at 19:30 UTC | |
Re: Regex capturing either quoted strings or bare words
by BrowserUk (Patriarch) on Jan 13, 2003 at 18:22 UTC | |
Re: Regex capturing either quoted strings or bare words
by ihb (Deacon) on Jan 13, 2003 at 17:51 UTC | |
Re: Regex capturing either quoted strings or bare words
by IlyaM (Parson) on Jan 14, 2003 at 11:15 UTC |