Re: validating a quoted string
by choroba (Cardinal) on Jan 13, 2016 at 13:44 UTC
|
Do you just want to count the number of double quotes and report whether it's even or odd?
#!/usr/bin/perl
use warnings;
use strict;
for my $string ( '"my" "dog"',
'"my" "dog shepherd"',
'my dog',
'my "dog shepherd"',
'"my "dog"',
'my "dog shepherd',
) {
my $valid = $string =~ tr/"// % 2 ? 0 : 1;
print "$valid: $string\n";
}
($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord
}map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
| [reply] [d/l] [select] |
Re: validating a quoted string
by Corion (Patriarch) on Jan 13, 2016 at 13:44 UTC
|
How does your regular expression fail?
A first step would be to explain the regular expression and tell us in English words what ([.]?) is supposed to do.
Ideally, describe in English what the complete regular expression should do.
My approach to solving the problem would be to take all parts of a string between double quotes and check each part that it doesn't contain a space. Alternatively, look at all "words" that are delimited with whitespace, and check that they either contain no double quotes or start with a double quote and end with a double quote.
Update: I misread part of the problem:
my "dog shepherd" <-- valid
So that would imply that simply checking for an even number of double quotes is the simplest part. | [reply] [d/l] [select] |
|
|
Sorry, My bad, typo,
it should be
$in =~ /(["]?)[^" ]+\1/
| [reply] [d/l] |
|
|
$in =~ /(["]?)[^" ]+\1/
does'nt work as expected. I can have a string separated by spaces. Sometimes I can also have two or more strings within quotes to be considered as a single string (just like command line arguments we pass to a script). | [reply] [d/l] |
|
|
A problem with your current regular expression is that [^" ]+ does not allow spaces.
This fails for example for the following string, which I think should be valid:
Corion says "hello sarathy"
If you want to stay with your approach of matching the tokens (words or quoted parts), I suggest reading perlre, especially on alternation.
For such an approach, I would restate the problem as Match every token that starts with a letter and consists only of letters, or starts with a double-quote and consists of non-double-quotes.. | [reply] [d/l] [select] |
|
|
#!/usr/bin/perl
use warnings;
use strict;
use Text::CSV;
my @strings = (
'"my" "dog"',
'"my" "dog shepherd"',
'my dog',
'my "dog shepherd"',
'"my "dog"',
'my "dog shepherd',
);
my $csv = Text::CSV->new ( {sep_char => ' '} )
or die "Cannot use CSV: ".Text::CSV->error_diag ();
foreach my $string (@strings){
open my $fh, '<', \$string or die "Cannot open string";
if (((my $temp = $string) =~ tr/"//) % 2){
warn "invalid string";
next;
}
my $row = $csv->getline($fh);
if (!defined $row) {
warn "getline failed";
next;
}
close $fh;
$" = ' | ';
print "@$row\n";
}
| [reply] [d/l] |
Re: validating a quoted string
by choroba (Cardinal) on Jan 13, 2016 at 22:12 UTC
|
It seems there hasn't been enough Marpa examples today. If you want to validate and split on unquoted space, build a proper parser!
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
use Marpa::R2;
my $dsl = << '__DSL__';
:default ::= action => ::first
lexeme default = latm => 1
List ::= Token action => [value]
| Token space List action => list
Token ::= Naked
| Quoted
| Quoted Token action => concat
| Naked Quoted action => concat
| Naked Quoted Token action => concat
Quoted ::= ('"') InQ ('"')
InQ ::= CharQ InQ action => concat
| CharQ
CharQ ::= nonq
| backslash quote action => second
Naked ::= CharO+ action => concat
CharO ::= nonqs
| backslash quote action => second
| backslash single_space action => second
backslash ~ [\\]
quote ~ '"'
single_space ~ [\s]
nonq ~ [^"\\]+
nonqs ~ [^"\s\\]+
space ~ [\s]+
__DSL__
sub concat { shift; join q(), @_ }
sub list { [ $_[1], @{ $_[3] } ] }
sub second { $_[2] }
my $grammar = 'Marpa::R2::Scanless::G'->new({ source => \$dsl });
for my $string (
'"ab" "cd"',
'"ef" "gh ij"',
'kl mn',
'op "qr st"',
'"uv \"wx" yz',
'AB\ CD',
'EF"GH"',
'"IJ""KL""MN"',
'OP"QR"\ "ST"UV"WX YZ"\"ABC"DEF"\"GHI\"',
'"abc\ def"',
'"ghi "jkl"',
'mno "pqr stu',
) {
my $v;
eval {
$v = $grammar->parse(\$string, 'main');
1 } or warn "invalid $string\n";
say join '|', @$$v if $v;
}
Output:
ab|cd
ef|gh ij
kl|mn
op|qr st
uv "wx|yz
AB CD
EFGH
IJKLMN
OPQR STUVWX YZ"ABCDEF"GHI"
invalid "abc\ def"
invalid "ghi "jkl"
invalid mno "pqr stu
($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord
}map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
| [reply] [d/l] [select] |
Re: validating a quoted string
by hippo (Archbishop) on Jan 13, 2016 at 13:48 UTC
|
say $in =~ (/"/g % 2) ? 'invalid' : 'valid';
Update: slower fingers than choroba but nice to know others thought of the same simple approach. | [reply] [d/l] |
Re: validating a quoted string
by AnomalousMonk (Archbishop) on Jan 13, 2016 at 17:32 UTC
|
As others have replied, simply counting double-quote characters with tr/// and discriminating based on even/odd count looks like the best way.
But if just gotta have a regex solution, maybe something like this will serve. Note that this code needs Perl version 5.10+ for the (*SKIP) (*FAIL) verbs; see Special Backtracking Control Verbs in perlre. (Note also that I use \x22 extensively in place of " because otherwise Windows command-line escaping becomes too weird.)
c:\@Work\Perl\monks>perl -wMstrict -le
"use 5.010;
;;
for my $string (
'\"my\" \"dog\"',
'\"my\" \"dog shepherd\"',
'my dog',
'my \"dog shepherd\"',
'\"my \"dog\"',
'my \"dog shepherd',
) {
my $paired = qr{ \x22 [^\x22]* (?: \\. [^\x22]*)* \x22 }xms;
my $unpaired = qr{ \x22 [^\x22]* \z }xms;
;;
my $orphan = $string =~ m{ $paired (*SKIP) (*FAIL) | $unpaired }xms
+;
print qq{'$string' }, $orphan ? 'INVALID' : 'valid';
}
"
'"my" "dog"' valid
'"my" "dog shepherd"' valid
'my dog' valid
'my "dog shepherd"' valid
'"my "dog"' INVALID
'my "dog shepherd' INVALID
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
Re: validating a quoted string
by poj (Abbot) on Jan 13, 2016 at 16:00 UTC
|
#!perl
use strict;
use Text::ParseWords;
while (<DATA>){
chomp;
my @parts = quotewords('\s+',0,$_);
print join "|",@parts,"\n";
}
__DATA__
"my" "dog"
"my" "dog shepherd"
my dog
my "dog shepherd"
"my "dog"
my "dog shepherd
poj | [reply] [d/l] |