vs95054 has asked for the wisdom of the Perl Monks concerning the following question:

Dear Perl Gurus,

I am no expert in Perl but I know that my requirement can only be done using perl (versus shell script) for this case. I have a CSV file which looks like the one in "Input" section. It has few fields which have double-quoted field AND also have a "," (comma) inside it. I need to remove the commans which are in a set of double quotes as shown in output section.

Could someone please help me to find out a way to do this? Please note that in one line, there may be multiple fields which have double-quotes (eg line 2). I need to remove comma only in double-quoted fields - not the field separators.

Thanks, VS95054

Input:

serial number,name, designation, division, city 1, Tom, IT Manager, "IT Deptt, XYZ company", San Jose 2, Peter, "SOX Auditor, Internal", "ABC Deptt, Amazon Inc", Seattle 3, Randy, "Quality Engineer, Prod", MIS, Santa Clara

Output:

serial number,name, designation, division, city 1, Tom, IT Manager, "IT Deptt XYZ company", San Jose 2, Peter, "SOX Auditor Internal", "ABC Deptt Amazon Inc", Seattle 3, Randy, "Quality Engineer Prod", MIS, Santa Clara

Replies are listed 'Best First'.
Re: Need help with double quotes and CSV file processing
by NetWallah (Canon) on Nov 04, 2013 at 20:22 UTC
    use Text::CSV;

    Here is a tutorial to get you started.

    From Text::CSV:

    quote_space

    By default, a space in a field would trigger quotation. As no rule exists this to be forced in CSV, nor any for the opposite, the default is true for safety. You can exclude the space from this trigger by setting this option to 0.

    Removing all commas inside a field is a matter of:

    s/,//g; ## OR ## # tr/,/ /;

                 When in doubt, mumble; when in trouble, delegate; when in charge, ponder. -- James H. Boren

      I just looked at that "tutorial", and have to conclude that to be be one of the worst tutorials I have ever seen. THE reason to use a CSV parsing module like Text::CSV_XS or Text::CSV is because, like in the case of the OP, the format is not straightforward: fields may contain " or , or even newlines. Both mentioned modules deal with that transparently.

      My advice: DO NOT READ that tutorial. Just read the manual for either module and follow the SYNOPSIS and examples:

      my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 }); open my $fh, "<", "file.csv" or die "file.csv: $!"; while (my $row = $csv->getline ($fh)) { # do something with @$row } close $fh or die "file.csv: $!";

      update 2013-11-25: That tutorial has been completely rewritten to reflect the current state of affairs, and is now well worth looking at.


      Enjoy, Have FUN! H.Merijn
Re: Need help with double quotes and CSV file processing
by Anonymous Monk on Nov 04, 2013 at 21:54 UTC
    The bottom line is simply this: "this is A(nother) Problem That Has Been (Thoroughly...) Solved Before." By Text::CSV.

    use it and enjoy the Goodness.
Re: Need help with double quotes and CSV file processing
by lightoverhead (Pilgrim) on Nov 04, 2013 at 20:39 UTC

    You can also use Tie::CSV_File.

    Then you can take care of each field value such as removing the comma inside a field.