in reply to Multiple double quotes within csv

I agree with Laurent_R , expanding the thought more...

This: Field1, ""Quoted String" rest of the data, presumably Field3 is here, Field4 looks odd to me. If there are unbalanced quotes in Field2 like you are showing, that could definitely cause problems. What do you think that the two sequential " characters at the beginning of Field2 mean?

The best would be if you showed a short piece of code that demo'es the exactly your problem and in this case, please specify exactly what the expected output should be. As a suggestion, I would also recommend that you import your CSV into Excel and see what it does with it. You can also use Excel as a "CSV Reference Implementation", enter in a row of data and then see what Excel generates. I have never worked for MS and I am not an Excel "fan". But I have never seen CSV-Text fail to parse something that Excel generated.

At this point, I am not sure whether we are dealing with an improper CSV format or a CSV-Text error or whatever. I have used this CSV module and gotten good results with it. The CSV format is devilishly complicated when weirdo cases are considered. It is best if we can work with a verbatim example, I would put it within <code>..</code> blocks to be sure that everybody is talking about exactly the same thing.

Update: See below post with test case from the OP and others from me. This looks like invalid CSV. With more examples of these invalid lines, I suppose an ad-hoc algorithm can be designed to "fix" the CSV before feeding it into Text-CSV.

Replies are listed 'Best First'.
Re^2: Multiple double quotes within csv
by mhooper (Novice) on May 13, 2017 at 00:22 UTC
    #!/usr/bin/perl use strict; use warnings; use Text::CSV; my $csv = Text::CSV->new({ sep_char => ',' }); while (my $line = <DATA>) if ($csv->parse($line)) { my @fields = $csv->fields(); print print "$fields[0],"; print "$fields[1],"; print "$fields[2]\n"; } else { warn "Line could not be parsed: $line\n"; } } __DATA__ 0,""Rat Control" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052",9
    I was hoping to get the output 0,"Rat Control <sip:+15559999999@192.168 .5.233>;tag=gK004bb052",9 but also acceptable as 0,""Rat Control" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052",9
      use Text::CSV; my $csv = Text::CSV->new ({ auto_diag => 1, allow_loose_quotes => 1, # optional, also works without this attr +ibute allow_loose_escapes => 1, }); while (my $row = $csv->getline (*DATA)) { say for @$row; } __END__ 0,""Rat Control" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052",9

      will produce

      0 "Rat Control" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052 9
        I tested this and verified that:
        my $csv = Text::CSV_XS->new({allow_loose_escapes => 1,});
        parses the OP's example correctly with my code using Text::CSV_XS.
        Input Line: 8,""Rat Control" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052",9 Output Line: 8|"Rat Control" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052|9
        Apparently although the OP's CSV line is technically incorrect according to the CSV spec, some smart folks who wrote Text::CSV anticipated this and have an option for it.

        Thank you Tux!

        As a comment: I always use the pipe, | character when I generate "CSV" files.
        Using a different character than "comma" is allowed by the spec. What is shown as the "output" line above, would be my "input" line with no need to use a complex module to parse things (in most, but not all cases). Of course we have to deal with what we get from others.. such is the nature of the beast.

      Your code does not compile.

      Here is some Working code, and the output it produces:

      #!/usr/bin/perl use strict; use warnings; use Text::CSV; my $csv = Text::CSV->new({ sep_char => ',', quote_char => undef , esc +ape_char=>undef }); while (my $line = <DATA>){ if ($csv->parse($line)) { my @fields = $csv->fields(); print "$fields[0],"; print "$fields[1],"; print "$fields[2]\n"; } else { warn "Line could not be parsed: '$line'\n"; my ($cde, $str, $pos, $rec, $fld) = $csv->error_diag (); print "DIAG:(CDE=$cde, STR=$str, POS=$pos, REC=$rec, FLD=$fld)\n +" } } __DATA__ 0,""Rat Control" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052",9
      >perl test2.pl 0,""Rat Control" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052",9
      The quotes at the start of Rat Control are problematic, and produce this error on default settings:
      DIAG:(CDE=2023, STR=EIQ - QUO character not allowed, POS=4, REC=1, FLD=2)

              ...Disinformation is not as good as datinformation.               Don't document the program; program the document.

        Re the good Abbot NetWallah's observation that your code "doesn't compile" -- spot on and ++ even though his code could still encounter problems with slight variation in the non-conformity of the CSV (discussion below) -- here's why ...and how to fix compilation failure part of the problem:

        #!/usr/bin/perl use strict; use warnings; # OP's code from question at #1190163 use Text::CSV; my $csv = Text::CSV->new({ sep_char => ',' }); while (my $line = <DATA>) { # added open curly if ($csv->parse($line)) { (my @fields) = $csv->fields(); # enclosed my @fields in () s +o $csv does not mask earlier print print "$fields[0],"; print "$fields[1],"; print "$fields[2]\n"; } else { warn "Line could not be parsed: $line\n"; } } __DATA__ 0,""Rat Control" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052",9

        If what you posted -- complete with strict and warnings -- that your attempt to run your code should have at least hinted at what was wrong. Adding use diagnostics (or perhaps use diagnostics -verbose or filtering your program thru splain would have at least allowed you to post code without errors... something the Monks hold to be an indicator that you're serious about learning rather than merely using us for human debuggers.

        As to the underlying problems, attend carefully to Marshall's thorough examination and exposition ... and join him in thanks to Tux for the module.

        Don't set quote_char and escape_char to undef, as that will cause any field that contains a sep_char to beak your data. For the question at hand, options like allow_loose_quotes and allow_loose_escapes are usually the way to go. In ths particular cae, setting escape_char to undef might work, but never set quote_char to undef to "fix" these kind of situations.

      Update: I think I'm closer:
      RFC-4180, paragraph "If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote."

      So:

      2,""Rat Control" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052",9
      is malformed, incorrect CSV, this should be:
      2,"""Rat Control"" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052",9
      I made some experiments. Here are my (updated) results:
      #!/usr/bin/perl use strict; use warnings; $|=1; ## turn off buffering for STDOUT use Text::CSV_XS qw( csv ); my $csv = Text::CSV_XS->new(); #using the defaults while (my $line = <DATA>) { if ($csv->parse($line)) { my @fields = $csv->fields(); print join ("|",@fields),"\n"; } else { warn "Line could not be parsed: $line\n"; } } =Prints: 1|Rat Control <sip:+15559999999@192.168 .5.233>;tag=gK004bb052|9 2|"Rat Control" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052|9 3|Rat Control <sip:+15559999999@192.168 .5.233>;tag=gK004bb052|9 Line could not be parsed: 4,"Rat Control" <sip:+15559999999@192.168 .5 +.233>;tag=gK004bb052,9 5|123,456|abc 6|Rat|xyz 7|Rat Control|xyz Line could not be parsed: 8,""Rat Control" <sip:+15559999999@192.168 . +5.233>;tag=gK004bb052",9 =cut __DATA__ 1,Rat Control <sip:+15559999999@192.168 .5.233>;tag=gK004bb052,9 2,"""Rat Control"" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052",9 3,Rat Control <sip:+15559999999@192.168 .5.233>;tag=gK004bb052,9 4,"Rat Control" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052,9 5,"123,456",abc 6,"Rat",xyz 7,"Rat Control",xyz 8,""Rat Control" <sip:+15559999999@192.168 .5.233>;tag=gK004bb052",9
      I do not understand why Line 4 which starts with unnecessary quotes is not parsed? update: But could be that the double quotes must apply to the whole field and therefore the syntax in line 2 must be used.See Line 5 which has an embedded comma and requires the quotes and is parsed correctly. See Lines 6, 7. I don't think the starting quotes are the issue, it appears that other "special" characters in Field2 are causing the problem.
      Instead of sep_char try
      binary => 1, allow_loose_quotes => 1, blank_is_undef => 1, escape_char => undef,