in reply to A problem with Text::CSV

I think I can say with confidence that there are no missing records, but there are 274 records with embedded newlines, so they are split over multiple lines in the CSV file. With embedded newlines, wc -l will not reliably tell you how many records there are.


Enjoy, Have FUN! H.Merijn

Replies are listed 'Best First'.
Re^2: A problem with Text::CSV
by lihao (Monk) on Mar 26, 2008 at 20:49 UTC

    ah, you are right, there are newlines contained in some CSV records, so the number of records 119606 should be OK, right?? I guess my second method is wrong. :)

    One last question: is there any way that I can check if there is any missing records by using $csv->getline($io) method? :-)

    Thanks again

    lihao

Re^2: A problem with Text::CSV
by lihao (Monk) on Mar 26, 2008 at 20:40 UTC

    Hi, thank you for the fast response:)

    I actually changed the while loop and found some of the records are from the trailing ^M, and after I removed them, I missed 137 records now, and I can find the problematic records with line_number anyway by now:-) I will check if there are embedded newlines within these fields..

    many thanks

    lihao

    while (my $record = <$csvfile>) { $line_no++; chomp $record; $record =~ s/\cM$//; if ($csv->parse($record)) { my @columns = $csv->fields(); my $value = "$columns[9], $columns[2]"; printf {$fout} "[good][%06d] %s\n", $line_no, $value; } else { printf {$fout} "[bad][%06d] %s\n", $line_no, $record; } }

      This is changing your script from good to bad. This is exactly what you should NOT do. The trailing ^M is part of the field and should not be removed.

      Use the comma-counting code from Narveson, and check if the lines that have a trailing ^M also have less comma's than the lines that seem to be correct. Note that even that is unreliable, as comma's can be part of a field when correctly quoted.

      Best way to find the problematic lines (if any) is to call the new () constructor with no arguments at all, and see where the parsing stops. Then use Text::CSV's diagnostics to see what caused the stop.


      Enjoy, Have FUN! H.Merijn