validating a quoted string

hellosarathy has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: validating a quoted string by choroba (Cardinal) on Jan 13, 2016 at 13:44 UTC
Do you just want to count the number of double quotes and report whether it's even or odd? `#!/usr/bin/perl use warnings; use strict; for my $string ( '"my" "dog"', '"my" "dog shepherd"', 'my dog', 'my "dog shepherd"', '"my "dog"', 'my "dog shepherd', ) { my $valid = $string =~ tr/"// % 2 ? 0 : 1; print "$valid: $string\n"; }` [download] ($q=q:Sq=~/;[c](.)(.)/;chr(-\|\|-\|5+lengthSq)`"S\|oS2"`map{chr \|+ord }map{substrSq`S_+\|`\|}3E\|-\|`7**2-3:)=~y+S\|`+$1,++print+eval$q,q,a, [download]	[reply] [d/l] [select]
Re: validating a quoted string by Corion (Patriarch) on Jan 13, 2016 at 13:44 UTC
How does your regular expression fail? A first step would be to explain the regular expression and tell us in English words what `([.]?)` is supposed to do. Ideally, describe in English what the complete regular expression should do. My approach to solving the problem would be to take all parts of a string between double quotes and check each part that it doesn't contain a space. Alternatively, look at all "words" that are delimited with whitespace, and check that they either contain no double quotes or start with a double quote and end with a double quote. Update: I misread part of the problem: `my "dog shepherd" <-- valid` [download] So that would imply that simply checking for an even number of double quotes is the simplest part.	[reply] [d/l] [select]
Re^2: validating a quoted string by hellosarathy (Novice) on Jan 13, 2016 at 13:49 UTC
Sorry, My bad, typo, it should be `$in =~ /(["]?)[^" ]+\1/` [download]	[reply] [d/l]
Re^2: validating a quoted string by hellosarathy (Novice) on Jan 13, 2016 at 13:51 UTC
my query: `$in =~ /(["]?)[^" ]+\1/` [download] does'nt work as expected. I can have a string separated by spaces. Sometimes I can also have two or more strings within quotes to be considered as a single string (just like command line arguments we pass to a script).	[reply] [d/l]
Re^3: validating a quoted string by Corion (Patriarch) on Jan 13, 2016 at 13:57 UTC
A problem with your current regular expression is that `[^" ]+` does not allow spaces. This fails for example for the following string, which I think should be valid: `Corion says "hello sarathy"` [download] If you want to stay with your approach of matching the tokens (words or quoted parts), I suggest reading perlre, especially on alternation. For such an approach, I would restate the problem as Match every token that starts with a letter and consists only of letters, or starts with a double-quote and consists of non-double-quotes..	[reply] [d/l] [select]
Re^3: validating a quoted string by BillKSmith (Monsignor) on Jan 13, 2016 at 14:41 UTC
I think that you want to parse the string the same way that a shell parses a command-line. The result is a list of sub-strings. It probably is a good idea to do the validation (use previous suggestions) before attempting the parse. I cannot think of a good way to do the parsing. I hope I have put other monks on the right track. UPDATE You can parse the strings with Text::CSV #!/usr/bin/perl use warnings; use strict; use Text::CSV; my @strings = ( '"my" "dog"', '"my" "dog shepherd"', 'my dog', 'my "dog shepherd"', '"my "dog"', 'my "dog shepherd', ); my $csv = Text::CSV->new ( {sep_char => ' '} ) or die "Cannot use CSV: ".Text::CSV->error_diag (); foreach my $string (@strings){ open my $fh, '<', \$string or die "Cannot open string"; if (((my $temp = $string) =~ tr/"//) % 2){ warn "invalid string"; next; } my $row = $csv->getline($fh); if (!defined $row) { warn "getline failed"; next; } close $fh; $" = ' \| '; print "@$row\n"; } [download] Bill	[reply] [d/l]
Re: validating a quoted string by choroba (Cardinal) on Jan 13, 2016 at 22:12 UTC
It seems there hasn't been enough Marpa examples today. If you want to validate and split on unquoted space, build a proper parser! #!/usr/bin/perl use warnings; use strict; use feature qw{ say }; use Marpa::R2; my $dsl = << '__DSL__'; :default ::= action => ::first lexeme default = latm => 1 List ::= Token action => [value] \| Token space List action => list Token ::= Naked \| Quoted \| Quoted Token action => concat \| Naked Quoted action => concat \| Naked Quoted Token action => concat Quoted ::= ('"') InQ ('"') InQ ::= CharQ InQ action => concat \| CharQ CharQ ::= nonq \| backslash quote action => second Naked ::= CharO+ action => concat CharO ::= nonqs \| backslash quote action => second \| backslash single_space action => second backslash ~ [\\] quote ~ '"' single_space ~ [\s] nonq ~ [^"\\]+ nonqs ~ [^"\s\\]+ space ~ [\s]+ __DSL__ sub concat { shift; join q(), @_ } sub list { [ $_[1], @{ $_[3] } ] } sub second { $_[2] } my $grammar = 'Marpa::R2::Scanless::G'->new({ source => \$dsl }); for my $string ( '"ab" "cd"', '"ef" "gh ij"', 'kl mn', 'op "qr st"', '"uv \"wx" yz', 'AB\ CD', 'EF"GH"', '"IJ""KL""MN"', 'OP"QR"\ "ST"UV"WX YZ"\"ABC"DEF"\"GHI\"', '"abc\ def"', '"ghi "jkl"', 'mno "pqr stu', ) { my $v; eval { $v = $grammar->parse(\$string, 'main'); 1 } or warn "invalid $string\n"; say join '\|', @$$v if $v; } [download] Output: `ab\|cd ef\|gh ij kl\|mn op\|qr st uv "wx\|yz AB CD EFGH IJKLMN OPQR STUVWX YZ"ABCDEF"GHI" invalid "abc\ def" invalid "ghi "jkl" invalid mno "pqr stu` [download] ($q=q:Sq=~/;[c](.)(.)/;chr(-\|\|-\|5+lengthSq)`"S\|oS2"`map{chr \|+ord }map{substrSq`S_+\|`\|}3E\|-\|`7**2-3:)=~y+S\|`+$1,++print+eval$q,q,a, [download]	[reply] [d/l] [select]
Re: validating a quoted string by hippo (Archbishop) on Jan 13, 2016 at 13:48 UTC
Count the double-quotes? `say $in =~ (/"/g % 2) ? 'invalid' : 'valid';` [download] Update: slower fingers than choroba but nice to know others thought of the same simple approach.	[reply] [d/l]
Re: validating a quoted string by AnomalousMonk (Archbishop) on Jan 13, 2016 at 17:32 UTC
As others have replied, simply counting double-quote characters with `tr///` and discriminating based on even/odd count looks like the best way. But if just gotta have a regex solution, maybe something like this will serve. Note that this code needs Perl version 5.10+ for the `(SKIP) (FAIL)` verbs; see Special Backtracking Control Verbs in perlre. (Note also that I use `\x22` extensively in place of `"` because otherwise Windows command-line escaping becomes too weird.) c:\@Work\Perl\monks>perl -wMstrict -le "use 5.010; ;; for my $string ( '\"my\" \"dog\"', '\"my\" \"dog shepherd\"', 'my dog', 'my \"dog shepherd\"', '\"my \"dog\"', 'my \"dog shepherd', ) { my $paired = qr{ \x22 [^\x22]* (?: \\. [^\x22]) \x22 }xms; my $unpaired = qr{ \x22 [^\x22]* \z }xms; ;; my $orphan = $string =~ m{ $paired (SKIP) (FAIL) \| $unpaired }xms +; print qq{'$string' }, $orphan ? 'INVALID' : 'valid'; } " '"my" "dog"' valid '"my" "dog shepherd"' valid 'my dog' valid 'my "dog shepherd"' valid '"my "dog"' INVALID 'my "dog shepherd' INVALID [download] Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re: validating a quoted string by poj (Abbot) on Jan 13, 2016 at 16:00 UTC
Maybe use Text::ParseWords `#!perl use strict; use Text::ParseWords; while (<DATA>){ chomp; my @parts = quotewords('\s+',0,$_); print join "\|",@parts,"\n"; } __DATA__ "my" "dog" "my" "dog shepherd" my dog my "dog shepherd" "my "dog" my "dog shepherd` [download] poj	[reply] [d/l]