in reply to A better way to split CSV files with quoted strings that may contain commas?

As others have already pointed out the "correct" way to deal with CSV files is to use one of the available modules, however this is a FAQ and a way of doing this by hand is discussed there.

/J\

  • Comment on Re: A better way to split CSV files with quoted strings that may contain commas?

Replies are listed 'Best First'.
Re^2: A better way to split CSV files with quoted strings that may contain commas?
by ruzam (Curate) on Jun 01, 2006 at 02:54 UTC
    Wow! What a timely tidbit of information. I was just pondering the same problem and your link was most helpful. But as I discovered from the FAQ link, the example doesn't handle extra spacing (around commas) very well.

    Here's my 'space fixed' version, with checking for single quotes as well:
    use strict; use warnings; # crazy mix of quoting my $test1 = q/, "test with space ",,, mary had , a,, 'cake, with +cheese' , and, "a, \"little" , lamb chop , /; # original FAQ example my $test2 = q/SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"E +rror, Core Dumped"/; # throw some wild spaces in there my $test3 = q/SAR001, "", "Cimetrix, Inc", "Bob Smith", "CAM",N, 8 ,1, +0 ,7, "Error, Core Dumped"/; # finally a nearly empty string my $test4 = q/,/; split_string($test1); split_string($test2); split_string($test3); split_string($test4); sub split_string { my $text = shift; my @new = (); push(@new, $+) while $text =~ m{ \s*( # groups the phrase inside double quotes "([^\"\\]*(?:\\.[^\"\\]*)*)"\s*,? # groups the phrase inside single quotes | '([^\'\\]*(?:\\.[^\'\\]*)*)'\s*,? # trims leading/trailing space from phrase | ([^,\s]+(?:\s+[^,\s]+)*)\s*,? # just to grab empty phrases | (), )\s*}gx; push(@new, undef) if $text =~ m/,\s*$/; # just to prove it's working print "string: >>$text<<\n"; foreach (@new) { print " part: >>" . (defined($_) ? $_ : '') . "<<\n"; } }