Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I am having problems parsing a CSV file. The error exists within a field that separates it's contents with a comma. I have tried almost everything. The Text::CVS_XS module still does not seem solve my problems. I've been trying to find a solution to this problem for days. I would really appreciate help with this.

Example CSV Record:

12345,Jim,,"""Smith, Jr.""",1234 W. Baker Ct., ,Ontario,CA,12345,123456789, ,,27-Apr-02,1234567

Even when using Text::CVS_XS, the "Jr." will be parsed as the address. Should I write a RegEx to substitute the comma with in the triple double quotes to a pipe (|) and then parse the record (only to substitute back to a "," before writing)? If so, I have no clue how to write the RegEx. For the record, I have tried to find answers on this site. I am not having any luck. Please advise.

Thanks in advance for any and all help!!!

CPB

Replies are listed 'Best First'.
Re: Problems With Parsing CSV File
by grep (Monsignor) on Jul 22, 2002 at 05:47 UTC

    Hopefully this will help you on your trouble shooting, but I get the correct result (e.g. The first set (1 & 6)of double quotes signify a text field that allows embedded commas and the second (2 & 5)and third sets (3 & 4) of double quotes signify an embedded double quote inside a double quoted string)

    This is Perl 5.6.1 and Text::CSV_XS V 0.23


    example code from Text::CSV_XS documentation
    #!/usr/bin/perl -w use Text::CSV_XS; my $csv = Text::CSV_XS->new; my $column = ''; my $sample_input_string = '12345,Jim,,"""Smith, Jr.""",1234 W. Baker C +t., ,Ontario,CA,12345,123456789, ,,27-Apr-02,1234567 '; if ($csv->parse($sample_input_string)) { my @field = $csv->fields; my $count = 0; for $column (@field) { print ++$count, " => ", $column, "\n"; } print "\n"; } else { my $err = $csv->error_input; print "parse() failed on argument: ", $err, "\n"; }

    Produces

    1 => 12345 2 => Jim 3 => 4 => "Smith, Jr." 5 => 1234 W. Baker Ct. 6 => 7 => Ontario 8 => CA 9 => 12345 10 => 123456789 11 => 12 => 13 => 27-Apr-02 14 => 1234567


    grep
    XP matters not. Look at me. Judge me by my XP, do you?
Re: Problems With Parsing CSV File
by Anonymous Monk on Jul 22, 2002 at 11:17 UTC
Re: Problems With Parsing CSV File
by fuzzysteve (Beadle) on Jul 22, 2002 at 11:49 UTC
    while you've been given better answers, if you're intrested, the rexexp :

    s/\"\"\"(.+?),(.+?)\"\"\"/\"\"\"$1|$2\"\"\"/

    should do it