Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

check if string is valid with special cases

by ovedpo15 (Pilgrim)
on May 15, 2018 at 10:13 UTC ( [id://1214533]=perlquestion: print w/replies, xml ) Need Help??

ovedpo15 has asked for the wisdom of the Perl Monks concerning the following question:

Consider that following string: a,b,c,d,e,f
this string has 6 substrings and 5 commas between them. each one of the substrings can contain whatever symbol there is. At the end I would like to split it like this:

 my ($a,$b,$c,$d,$e,$f) = split(/,/ $string);

But first I would like to check that this string is valid meaning there is a substring between the commas.
I can use like this:  if(!defined($a) || !defined($b) || ... || !defined($f)) ...

but it doesn't look very good and it's too long. I would like somehow to check it with split or regex.
I also tried to use if(($string =~ tr/,//) != 5) ... but it isn't a good idea because I won't catch the "a,b,c,d,,f" case or if the one of the substrings will conatine a comma (for example: $b = "hello_world,bye";)

Replies are listed 'Best First'.
Re: check if string is valid with special cases
by swl (Parson) on May 15, 2018 at 10:48 UTC

    If your strings could contain embedded commas then use Text::CSV_XS or similar to split it into an array, then all or any from List::Util to check that none or some are blank.

    use Text::CSV_XS; use List::Util qw /any/; my $csv = Text::CSV_XS->new; my @strings = ( 'a,b,c,d,e,f', 'a,b,,d,e,f', 'a,b,"c,and,some,commas",d,e,f', 'a,b,c,d,e', ); foreach my $string (@strings) { print "Checking $string\n"; my $status = $csv->parse ($string); my @array = $csv->fields; warn ($csv->error_input) if !$status; print "Did not get six entries in $string\n" if not @array == 6; print "one entry in $string is zero length\n" if any {!length $_} @array; print join ':', @array; print "\n\n"; }
Re: check if string is valid with special cases (updated)
by haukex (Archbishop) on May 15, 2018 at 10:22 UTC

    TIMTOWTDI; personally I'd write:

    die "invalid format: $string" unless $string=~/\A[^,]+(?:,[^,]+){5}\z/;

    Update: Note your specification is a bit unclear, you say "each one of the substrings can contain whatever symbol there is" but then later on say that the strings can't* contain commas. What about, for example, "1,2,3, ,5,6" (which the above regex will call valid)? See also Re: How to ask better questions using Test::More and sample data.

    * Update 2: Sorry, I misunderstood your post, you're saying the strings can contain commas. See my reply below.

      Im sorry, I meant that the substring can conatin commas in it. nevertheless, is it possible to include a case of space (bad case - "1,2,3, ,4,5") for the regex you wrote? I think its the regex I need.
        the substring can conatin commas in it

        Sorry, this doesn't make sense to me. If the input string is "1,2,3,4,5,6,7", then does $a get "1,2", or does $b get "2,3", and so on...

        Or is this CSV, as swl correctly pointed out? Then you should use Text::CSV.

        If not, then as I said, please provide lots of examples of valid input with the expected output, as well as lots of examples of invalid input.

Re: check if string is valid with special cases
by hippo (Bishop) on May 15, 2018 at 11:07 UTC

    If I've understood your specification correctly then this should perform the tests you require.

    use strict; use warnings; use Test::More tests => 2; my $valid_in = 'a,b,c,d,e,f'; my $invalid_in = 'a,b,c,d,,f'; ok (test_it ($valid_in), "'$valid_in' is a valid argument"); ok (!test_it ($invalid_in), "'$invalid_in' is an invalid argument"); sub test_it { my ($string) = @_; my @fields = split (/,/, $string); my $emptyfields = grep { $_ eq '' } @fields; if (scalar @fields == 6 && $emptyfields == 0) { return @fields; } return 0; }
Re: check if string is valid with special cases (updated)
by AnomalousMonk (Archbishop) on May 15, 2018 at 14:09 UTC

    I agree with others that this looks like a Text::CSV problem (and also that your various problem statements are rather vague).

    However, in a general regex parsing (and regexes are often not the best approach to parsing) situation, my approach tends to be something like

    c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le "my $s = 'a,b,c,d,e,f'; ;; my $sym = qr{ [^,] }xms; my $sep = qr{ , }xms; ;; $s =~ m{ \A $sym (?: $sep $sym){5} \z }xms or die qq{bad string: '$s' +}; ;; my ($u, $v, $w, $x, $y, $z) = $s =~ m{ $sym }xmsg; dd $u, $v, $w, $x, $y, $z; " ("a", "b", "c", "d", "e", "f")
    Once you know a string is valid, it's often quite easy to strip out sub-strings of interest. Oh, you say the separator pattern should include possible spaces? Then
        my $sep = qr{ \s* , \s* }xms;
    Oh, the "symbol" may be more than a single character and must also exclude spaces? Then
        my $sym = qr{ [^,\s]+ }xms;
    And so on. Separately defining $sym and $sep makes them easy to change and makes any change propagate throughout the code as necessary (DRY).

    And yes, test all this stuff! (Test::More and friends.)

    Update: Here's another approach to regex (again, maybe not the best option) parsing. It combines validation and extraction into a single regex, but the regex is significantly more complex and probably a bit slower. It also needs Perl version 5.10+ for the  \K operator.

    c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le "use 5.010; ;; my $s = 'a,b, CC ,d , e,fgh '; ;; my $sym = qr{ [^,\s]+ }xms; my $sep = qr{ \s* , \s* }xms; ;; my $n_syms = my ($u, $v, $w, $x, $y, $z) = $s =~ m{ (?: \G (?! \A) $sep | \A \s*) \K $sym (?= (?: $sep $sym)* \s* \z) }xmsg; ;; $n_syms == 6 or die qq{bad string: '$s'}; dd $u, $v, $w, $x, $y, $z; " ("a", "b", "CC", "d", "e", "fgh")
    (Testing, testing...)


    Give a man a fish:  <%-{-{-{-<

Re: check if string is valid with special cases
by mr_ron (Chaplain) on May 15, 2018 at 21:29 UTC
    Like most of the other answers I will recommend Text::CSV but with a hopefully helpful explanation. Your question is asking for comma separated fields but a field can also contain a comma (,) enclosed in quotes ("). What about a field that needs to have " characters in it as well? CSV has a standard way of handling all these issues and it is not so easy to do with one regex. What about a,b,c,"",e,f ? Again CSV will just handle it. The use of List::Util suggested by swl makes a good refinement but hopefully the example below is a start:
    #!/usr/bin/env perl use Modern::Perl; use Text::CSV; my $csv = Text::CSV->new; my @strings = ( 'a,"b,c",3,4,5,6', 'a,"b,c",3,4,5', 'a,"b,c",3,4,5,""', 'abc,def,"ghi,jkl",,mno,6' ); foreach my $s (@strings) { if (my $status = $csv->parse($s)) { if ( (grep { /\w/ } $csv->fields) != 6 ) { warn "row failed: ", $csv->string } } else { warn $csv->error_input } }
    Ron
Re: check if string is valid with special cases
by kcott (Archbishop) on May 16, 2018 at 09:46 UTC

    G'day ovedpo15,

    You can do your validation with this condition:

    $string =~ y/,/,/ == -1 + grep length y/ //dr, split /,/, $string

    I ran these tests based on the ever-shifting goal posts throughout this thread. :-)

    $ alias perle alias perle='perl -Mstrict -Mwarnings -Mautodie=:all -E' $ perle 'my $x = "a,b,c"; say +($x =~ y/,/,/ == -1 + grep length y/ // +dr, split /,/, $x) ? "Y" : "N"' Y $ perle 'my $x = "a,b ,c"; say +($x =~ y/,/,/ == -1 + grep length y/ / +/dr, split /,/, $x) ? "Y" : "N"' Y $ perle 'my $x = "a, ,c"; say +($x =~ y/,/,/ == -1 + grep length y/ / +/dr, split /,/, $x) ? "Y" : "N"' N $ perle 'my $x = "a, ,c"; say +($x =~ y/,/,/ == -1 + grep length y/ // +dr, split /,/, $x) ? "Y" : "N"' N $ perle 'my $x = "a,,c"; say +($x =~ y/,/,/ == -1 + grep length y/ //d +r, split /,/, $x) ? "Y" : "N"' N

    — Ken

Re: check if string is valid with special cases
by Veltro (Hermit) on May 15, 2018 at 15:03 UTC
    -

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1214533]
Approved by haukex
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (4)
As of 2024-03-28 23:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found