Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re^2: A problem with Text::CSV

by lihao (Monk)
on Mar 26, 2008 at 20:40 UTC ( [id://676533]=note: print w/replies, xml ) Need Help??


in reply to Re: A problem with Text::CSV
in thread A problem with Text::CSV

Hi, thank you for the fast response:)

I actually changed the while loop and found some of the records are from the trailing ^M, and after I removed them, I missed 137 records now, and I can find the problematic records with line_number anyway by now:-) I will check if there are embedded newlines within these fields..

many thanks

lihao

while (my $record = <$csvfile>) { $line_no++; chomp $record; $record =~ s/\cM$//; if ($csv->parse($record)) { my @columns = $csv->fields(); my $value = "$columns[9], $columns[2]"; printf {$fout} "[good][%06d] %s\n", $line_no, $value; } else { printf {$fout} "[bad][%06d] %s\n", $line_no, $record; } }

Replies are listed 'Best First'.
Re^3: A problem with Text::CSV
by Tux (Canon) on Mar 27, 2008 at 07:36 UTC

    This is changing your script from good to bad. This is exactly what you should NOT do. The trailing ^M is part of the field and should not be removed.

    Use the comma-counting code from Narveson, and check if the lines that have a trailing ^M also have less comma's than the lines that seem to be correct. Note that even that is unreliable, as comma's can be part of a field when correctly quoted.

    Best way to find the problematic lines (if any) is to call the new () constructor with no arguments at all, and see where the parsing stops. Then use Text::CSV's diagnostics to see what caused the stop.


    Enjoy, Have FUN! H.Merijn

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://676533]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (5)
As of 2024-03-28 13:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found