hellosarathy has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks,

How do I match, validate and extract space separated strings within quotes? I can have a string separated by spaces. Sometimes I can also have two or more strings within quotes to be considered as a single string (just like command line arguments we pass to a script).

I am taking 2 strings as input and it can be:

Example:

"my" "dog" <-- valid "my" "dog shepherd" <-- valid my dog <-- valid my "dog shepherd" <-- valid "my "dog" <-- invalid my "dog shepherd <-- invalid
my query:
$in =~ /(["]?)[^" ]+\$&\s+ (["]?)[^" ]+\$&/
does'nt work as expected. Pls help.

Replies are listed 'Best First'.
Re: validating a quoted string
by choroba (Cardinal) on Jan 13, 2016 at 13:44 UTC
    Do you just want to count the number of double quotes and report whether it's even or odd?
    #!/usr/bin/perl use warnings; use strict; for my $string ( '"my" "dog"', '"my" "dog shepherd"', 'my dog', 'my "dog shepherd"', '"my "dog"', 'my "dog shepherd', ) { my $valid = $string =~ tr/"// % 2 ? 0 : 1; print "$valid: $string\n"; }
    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re: validating a quoted string
by Corion (Patriarch) on Jan 13, 2016 at 13:44 UTC

    How does your regular expression fail?

    A first step would be to explain the regular expression and tell us in English words what ([.]?) is supposed to do.

    Ideally, describe in English what the complete regular expression should do.

    My approach to solving the problem would be to take all parts of a string between double quotes and check each part that it doesn't contain a space. Alternatively, look at all "words" that are delimited with whitespace, and check that they either contain no double quotes or start with a double quote and end with a double quote.

    Update: I misread part of the problem:

    my "dog shepherd" <-- valid

    So that would imply that simply checking for an even number of double quotes is the simplest part.

      Sorry, My bad, typo, it should be
      $in =~ /(["]?)[^" ]+\1/
      my query:
      $in =~ /(["]?)[^" ]+\1/
      does'nt work as expected. I can have a string separated by spaces. Sometimes I can also have two or more strings within quotes to be considered as a single string (just like command line arguments we pass to a script).

        A problem with your current regular expression is that [^" ]+ does not allow spaces.

        This fails for example for the following string, which I think should be valid:

        Corion says "hello sarathy"

        If you want to stay with your approach of matching the tokens (words or quoted parts), I suggest reading perlre, especially on alternation.

        For such an approach, I would restate the problem as Match every token that starts with a letter and consists only of letters, or starts with a double-quote and consists of non-double-quotes..

        I think that you want to parse the string the same way that a shell parses a command-line. The result is a list of sub-strings. It probably is a good idea to do the validation (use previous suggestions) before attempting the parse. I cannot think of a good way to do the parsing. I hope I have put other monks on the right track.

        UPDATE

        You can parse the strings with Text::CSV

        #!/usr/bin/perl use warnings; use strict; use Text::CSV; my @strings = ( '"my" "dog"', '"my" "dog shepherd"', 'my dog', 'my "dog shepherd"', '"my "dog"', 'my "dog shepherd', ); my $csv = Text::CSV->new ( {sep_char => ' '} ) or die "Cannot use CSV: ".Text::CSV->error_diag (); foreach my $string (@strings){ open my $fh, '<', \$string or die "Cannot open string"; if (((my $temp = $string) =~ tr/"//) % 2){ warn "invalid string"; next; } my $row = $csv->getline($fh); if (!defined $row) { warn "getline failed"; next; } close $fh; $" = ' | '; print "@$row\n"; }
        Bill
Re: validating a quoted string
by choroba (Cardinal) on Jan 13, 2016 at 22:12 UTC
    It seems there hasn't been enough Marpa examples today. If you want to validate and split on unquoted space, build a proper parser!
    #!/usr/bin/perl use warnings; use strict; use feature qw{ say }; use Marpa::R2; my $dsl = << '__DSL__'; :default ::= action => ::first lexeme default = latm => 1 List ::= Token action => [value] | Token space List action => list Token ::= Naked | Quoted | Quoted Token action => concat | Naked Quoted action => concat | Naked Quoted Token action => concat Quoted ::= ('"') InQ ('"') InQ ::= CharQ InQ action => concat | CharQ CharQ ::= nonq | backslash quote action => second Naked ::= CharO+ action => concat CharO ::= nonqs | backslash quote action => second | backslash single_space action => second backslash ~ [\\] quote ~ '"' single_space ~ [\s] nonq ~ [^"\\]+ nonqs ~ [^"\s\\]+ space ~ [\s]+ __DSL__ sub concat { shift; join q(), @_ } sub list { [ $_[1], @{ $_[3] } ] } sub second { $_[2] } my $grammar = 'Marpa::R2::Scanless::G'->new({ source => \$dsl }); for my $string ( '"ab" "cd"', '"ef" "gh ij"', 'kl mn', 'op "qr st"', '"uv \"wx" yz', 'AB\ CD', 'EF"GH"', '"IJ""KL""MN"', 'OP"QR"\ "ST"UV"WX YZ"\"ABC"DEF"\"GHI\"', '"abc\ def"', '"ghi "jkl"', 'mno "pqr stu', ) { my $v; eval { $v = $grammar->parse(\$string, 'main'); 1 } or warn "invalid $string\n"; say join '|', @$$v if $v; }

    Output:

    ab|cd ef|gh ij kl|mn op|qr st uv "wx|yz AB CD EFGH IJKLMN OPQR STUVWX YZ"ABCDEF"GHI" invalid "abc\ def" invalid "ghi "jkl" invalid mno "pqr stu
    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re: validating a quoted string
by hippo (Archbishop) on Jan 13, 2016 at 13:48 UTC

    Count the double-quotes?

    say $in =~ (/"/g % 2) ? 'invalid' : 'valid';

    Update: slower fingers than choroba but nice to know others thought of the same simple approach.

Re: validating a quoted string
by AnomalousMonk (Archbishop) on Jan 13, 2016 at 17:32 UTC

    As others have replied, simply counting double-quote characters with  tr/// and discriminating based on even/odd count looks like the best way.

    But if just gotta have a regex solution, maybe something like this will serve. Note that this code needs Perl version 5.10+ for the  (*SKIP) (*FAIL) verbs; see Special Backtracking Control Verbs in perlre. (Note also that I use  \x22 extensively in place of  " because otherwise Windows command-line escaping becomes too weird.)

    c:\@Work\Perl\monks>perl -wMstrict -le "use 5.010; ;; for my $string ( '\"my\" \"dog\"', '\"my\" \"dog shepherd\"', 'my dog', 'my \"dog shepherd\"', '\"my \"dog\"', 'my \"dog shepherd', ) { my $paired = qr{ \x22 [^\x22]* (?: \\. [^\x22]*)* \x22 }xms; my $unpaired = qr{ \x22 [^\x22]* \z }xms; ;; my $orphan = $string =~ m{ $paired (*SKIP) (*FAIL) | $unpaired }xms +; print qq{'$string' }, $orphan ? 'INVALID' : 'valid'; } " '"my" "dog"' valid '"my" "dog shepherd"' valid 'my dog' valid 'my "dog shepherd"' valid '"my "dog"' INVALID 'my "dog shepherd' INVALID


    Give a man a fish:  <%-{-{-{-<

Re: validating a quoted string
by poj (Abbot) on Jan 13, 2016 at 16:00 UTC

    Maybe use Text::ParseWords

    #!perl use strict; use Text::ParseWords; while (<DATA>){ chomp; my @parts = quotewords('\s+',0,$_); print join "|",@parts,"\n"; } __DATA__ "my" "dog" "my" "dog shepherd" my dog my "dog shepherd" "my "dog" my "dog shepherd
    poj