ovedpo15 has asked for the wisdom of the Perl Monks concerning the following question:
Consider that following string: a,b,c,d,e,f
this string has 6 substrings and 5 commas between them. each one of the substrings can contain whatever symbol there is. At the end I would like to split it like this:
my ($a,$b,$c,$d,$e,$f) = split(/,/ $string);
But first I would like to check that this string is valid meaning there is a substring between the commas.
I can use like this:
if(!defined($a) || !defined($b) || ... || !defined($f)) ...
but it doesn't look very good and it's too long. I would like somehow to check it with split or regex.
I also tried to use if(($string =~ tr/,//) != 5) ... but it isn't a good idea because I won't catch the "a,b,c,d,,f" case or if the one of the substrings will conatine a comma (for example: $b = "hello_world,bye";)
Re: check if string is valid with special cases
by swl (Parson) on May 15, 2018 at 10:48 UTC
|
If your strings could contain embedded commas then use Text::CSV_XS or similar to split it into an array, then all or any from List::Util to check that none or some are blank.
use Text::CSV_XS;
use List::Util qw /any/;
my $csv = Text::CSV_XS->new;
my @strings = (
'a,b,c,d,e,f',
'a,b,,d,e,f',
'a,b,"c,and,some,commas",d,e,f',
'a,b,c,d,e',
);
foreach my $string (@strings) {
print "Checking $string\n";
my $status = $csv->parse ($string);
my @array = $csv->fields;
warn ($csv->error_input)
if !$status;
print "Did not get six entries in $string\n"
if not @array == 6;
print "one entry in $string is zero length\n"
if any {!length $_} @array;
print join ':', @array;
print "\n\n";
}
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: check if string is valid with special cases (updated)
by haukex (Archbishop) on May 15, 2018 at 10:22 UTC
|
die "invalid format: $string"
unless $string=~/\A[^,]+(?:,[^,]+){5}\z/;
Update: Note your specification is a bit unclear, you say "each one of the substrings can contain whatever symbol there is" but then later on say that the strings can't* contain commas. What about, for example, "1,2,3, ,5,6" (which the above regex will call valid)? See also Re: How to ask better questions using Test::More and sample data.
* Update 2: Sorry, I misunderstood your post, you're saying the strings can contain commas. See my reply below. | [reply] [Watch: Dir/Any] [d/l] [select] |
|
Im sorry, I meant that the substring can conatin commas in it. nevertheless, is it possible to include a case of space (bad case - "1,2,3, ,4,5") for the regex you wrote? I think its the regex I need.
| [reply] [Watch: Dir/Any] |
|
the substring can conatin commas in it
Sorry, this doesn't make sense to me. If the input string is "1,2,3,4,5,6,7", then does $a get "1,2", or does $b get "2,3", and so on...
Or is this CSV, as swl correctly pointed out? Then you should use Text::CSV.
If not, then as I said, please provide lots of examples of valid input with the expected output, as well as lots of examples of invalid input.
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
|
|
Re: check if string is valid with special cases
by hippo (Bishop) on May 15, 2018 at 11:07 UTC
|
use strict;
use warnings;
use Test::More tests => 2;
my $valid_in = 'a,b,c,d,e,f';
my $invalid_in = 'a,b,c,d,,f';
ok (test_it ($valid_in), "'$valid_in' is a valid argument");
ok (!test_it ($invalid_in), "'$invalid_in' is an invalid argument");
sub test_it {
my ($string) = @_;
my @fields = split (/,/, $string);
my $emptyfields = grep { $_ eq '' } @fields;
if (scalar @fields == 6 &&
$emptyfields == 0) {
return @fields;
}
return 0;
}
| [reply] [Watch: Dir/Any] [d/l] |
Re: check if string is valid with special cases (updated)
by AnomalousMonk (Archbishop) on May 15, 2018 at 14:09 UTC
|
I agree with others that this looks like a Text::CSV problem (and also that your various problem statements are rather vague).
However, in a general regex parsing (and regexes are often not the best approach to parsing) situation, my approach tends to be something like
c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le
"my $s = 'a,b,c,d,e,f';
;;
my $sym = qr{ [^,] }xms;
my $sep = qr{ , }xms;
;;
$s =~ m{ \A $sym (?: $sep $sym){5} \z }xms or die qq{bad string: '$s'
+};
;;
my ($u, $v, $w, $x, $y, $z) = $s =~ m{ $sym }xmsg;
dd $u, $v, $w, $x, $y, $z;
"
("a", "b", "c", "d", "e", "f")
Once you know a string is valid, it's often quite easy to strip out sub-strings of interest. Oh, you say the separator pattern should include possible spaces? Then
my $sep = qr{ \s* , \s* }xms;
Oh, the "symbol" may be more than a single character and must also exclude spaces? Then
my $sym = qr{ [^,\s]+ }xms;
And so on. Separately defining $sym and $sep makes them easy to change and makes any change propagate throughout the code as necessary (DRY).
And yes, test all this stuff! (Test::More and friends.)
Update: Here's another approach to regex (again, maybe not the best option) parsing. It combines validation and extraction into a single regex, but the regex is significantly more complex and probably a bit slower. It also needs Perl version 5.10+ for the \K operator.
c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le
"use 5.010;
;;
my $s = 'a,b, CC ,d , e,fgh ';
;;
my $sym = qr{ [^,\s]+ }xms;
my $sep = qr{ \s* , \s* }xms;
;;
my $n_syms =
my ($u, $v, $w, $x, $y, $z) =
$s =~ m{
(?: \G (?! \A) $sep | \A \s*) \K
$sym
(?= (?: $sep $sym)* \s* \z)
}xmsg;
;;
$n_syms == 6 or die qq{bad string: '$s'};
dd $u, $v, $w, $x, $y, $z;
"
("a", "b", "CC", "d", "e", "fgh")
(Testing, testing...)
Give a man a fish: <%-{-{-{-<
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: check if string is valid with special cases
by mr_ron (Chaplain) on May 15, 2018 at 21:29 UTC
|
Like most of the other answers I will recommend Text::CSV but with a hopefully helpful explanation. Your question is asking for comma separated fields but a field can also contain a comma (,) enclosed in quotes ("). What about a field that needs to have " characters in it as well? CSV has a standard way of handling all these issues and it is not so easy to do with one regex. What about a,b,c,"",e,f ? Again CSV will just handle it. The use of List::Util suggested by swl makes a good refinement but hopefully the example below is a start:
#!/usr/bin/env perl
use Modern::Perl;
use Text::CSV;
my $csv = Text::CSV->new;
my @strings = (
'a,"b,c",3,4,5,6',
'a,"b,c",3,4,5',
'a,"b,c",3,4,5,""',
'abc,def,"ghi,jkl",,mno,6'
);
foreach my $s (@strings) {
if (my $status = $csv->parse($s)) {
if ( (grep { /\w/ } $csv->fields) != 6 ) {
warn "row failed: ", $csv->string
}
}
else {
warn $csv->error_input
}
}
| [reply] [Watch: Dir/Any] [d/l] |
Re: check if string is valid with special cases
by kcott (Archbishop) on May 16, 2018 at 09:46 UTC
|
$string =~ y/,/,/ == -1 + grep length y/ //dr, split /,/, $string
I ran these tests based on the ever-shifting goal posts throughout this thread. :-)
$ alias perle
alias perle='perl -Mstrict -Mwarnings -Mautodie=:all -E'
$ perle 'my $x = "a,b,c"; say +($x =~ y/,/,/ == -1 + grep length y/ //
+dr, split /,/, $x) ? "Y" : "N"'
Y
$ perle 'my $x = "a,b ,c"; say +($x =~ y/,/,/ == -1 + grep length y/ /
+/dr, split /,/, $x) ? "Y" : "N"'
Y
$ perle 'my $x = "a, ,c"; say +($x =~ y/,/,/ == -1 + grep length y/ /
+/dr, split /,/, $x) ? "Y" : "N"'
N
$ perle 'my $x = "a, ,c"; say +($x =~ y/,/,/ == -1 + grep length y/ //
+dr, split /,/, $x) ? "Y" : "N"'
N
$ perle 'my $x = "a,,c"; say +($x =~ y/,/,/ == -1 + grep length y/ //d
+r, split /,/, $x) ? "Y" : "N"'
N
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: check if string is valid with special cases
by Veltro (Hermit) on May 15, 2018 at 15:03 UTC
|
| [reply] [Watch: Dir/Any] |
|
|