Re: Splitting a line on just commas
by ikegami (Patriarch) on Jun 13, 2010 at 18:38 UTC
|
| [reply] |
|
|
@arr = split /(".+?")|,/, $s
print @arr
a
b
"hey,you"
"str1, str2, str3"
end
| [reply] |
|
|
That doesn't work. It actually returns
(
'a',
undef,
'b',
undef,
'',
'"hey, you"',
'',
undef,
'',
'"str1, str2, str3"',
'',
undef,
'end',
)
Keep in mind the first arg of split is a separator.
| [reply] [d/l] [select] |
Re: Splitting a line on just commas
by CountZero (Bishop) on Jun 13, 2010 at 18:39 UTC
|
One answer : Text::CSV
CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James
| [reply] |
Re: Splitting a line on just commas
by BrowserUk (Patriarch) on Jun 13, 2010 at 19:37 UTC
|
$s = 'a,b,"hey, you","str1, str2, str3",end';;
print for split ',(?=\S)', $s;;
a
b
"hey, you"
"str1, str2, str3"
end
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
|
|
a,b,"hey,you",etc
a, b, "hey, you", etc
Which is why it was qualified with "if your sample is indicative". A more general solution can be found by focusing on the fields themselves, rather than the commas:
@fields = $s =~ /("[^"]*"|[^,]*),/gc;
$lastfield = $s =~ /\G(.*)/;
push @fields, $lastfield;
But even that has no provision for placing a quotation mark inside a quoted string, and I'm sure there are other things I missed. The problem is hairier than it looks, hence, Text::CSV or Text::CSV_XS is best. | [reply] [d/l] [select] |
|
|
A more general solution can be found
The phrase "more general" is similar to "a bit pregnant".
There is little point in catering for one possibility not in evidence and not another. You should either cater for what is; or for every possibility.
As noted on wikipedia, there is no single consistent standard for what constitute CSV or TSV etc. The module we both mentioned therefore jumps through extraordinary hoops to try and cater for every possible variation--and inevitably fails.
But, individual sources of CSV output are usually self-consistent.
Just as we don't take a universal phrase book covering every known language--were such available--with us when travelling to a particular country, it rarely makes sense to try and cater for non-evident possibilities unless you are going to cater for them all. You're either doing more work than necessary; or not enough.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
Re: Splitting a line on just commas
by Anonymous Monk on Jun 14, 2010 at 15:39 UTC
|
use Text::ParseWords;
my $line = q(a,b,"hey, you","str1, str2, str3",end);
@words = "ewords(',', 0, $line);
| [reply] [d/l] |
|
|
use strict;
use warnings;
use Text::ParseWords;
my $line = q(a,b,"hey, you","str1, str2, str3",end);
my @words = quotewords(',', 1, $line);
print "$_\n" for @words;
__END__
a
b
"hey, you"
"str1, str2, str3"
end
| [reply] [d/l] |
Re: Splitting a line on just commas
by reddydn (Initiate) on Jun 14, 2010 at 17:20 UTC
|
Try this
@arr = split /(".+?")|,/, $s ;
print @arr;
| [reply] |
Re: Splitting a line on just commas
by deMize (Monk) on Jun 14, 2010 at 17:34 UTC
|
Response: I'd go the Text::CSV route, but this might help get you started
use strict;
sub main{
my $text = qq{a,b,"hey, you","str1, str2, str3",end};
print "Input: $text\n\n";
# Split the delimiters
my @values = split( /(?:\,|(\".*?\"))/ , $text);
# Remove the created blanks
@values = grep{$_ ne ''} @values;
# Output
foreach (0..$#values){
print "$_: $values[$_] \n";
}
}
main();
Output:
Input: a,b,"hey, you","str1, str2, str3",end
0: a
1: b
2: "hey, you"
3: "str1, str2, str3"
4: end
Thoughts: I haven't really thought why the blanks are being created - if you take away the grep, you'll see what I'm talking about. I still advise using Text::CSV because using this grep method will remove wanted blanks. Therefore, the above code has structural integrity problems.
Example: a,b,,d,e
You probably really want that space holder there if you're going to be inserting this into a database. The grep would remove it because it has a blank string value ("").
Demize | [reply] [d/l] [select] |
|
|
Thing being separated
vvvvvvv
/(?:\,|(\".*?\"))/
^^
Separator
How are those two things on equal footing?
| [reply] [d/l] [select] |
|
|
Response: I was about to say, you might want to remove all the undefined created by the unmatched parens, before removing the blank fields.
@values = grep{defined} @values;
@values = grep{$_ ne ''} @values;
or
@values = grep{defined && $_ ne ''} @values;
Again, I would not use this method. It's not good to remove blank string values.
As for the equal footing, would this be any less equal: /(?:\,)|(\".*?\")/
Update: I did forget to include the trailing comma after the quotes, but I still wouldn't use it: /\,|(?:(\".*?\")\,)/
Demize | [reply] [d/l] [select] |
|
|
|
|
|
Re: Splitting a line on just commas
by furry_marmot (Pilgrim) on Jun 16, 2010 at 22:24 UTC
|
$s = 'a,b,"hey, you","str1, str2, str3",end';
push @fields, $1 while $s =~ /("[^"]+"|[^",]+)(?:,|$)/g;
print "$_\n" for @fields;
or
$s = 'a,b,"hey, you","str1, str2, str3",end';
@fields = $s =~ /("[^"]+"|[^,]+)(?:,|$)/g; # Use +, not *, or you get
+ a blank element
print "$_\n" for @fields;
Both print:
a
b
"hey, you"
"str1, str2, str3"
end
| [reply] [d/l] [select] |