in reply to Recommendations for parsing invalid CSV

I'm not aware of any tool that does that, but it isn't hard to find the problematic lines, as these will have more delimiters in them than the number of columns. All other lines are trivially decodable and pose no problem, as the quotes are unnecessary there.

For the problematic lines, I would parse the complete line and aim for ," to start a quoted string and ", to end said string. I would also log all these problematic lines for later human inspection. You could also consider to ask a human about these problematic lines instead of spilling one interpretation into the DB.

Replies are listed 'Best First'.
Re^2: Recommendations for parsing invalid CSV
by markjugg (Curate) on Apr 21, 2008 at 14:34 UTC

    I think the suggestion to focus on the "problem lines" is a good one. If I have to pre-parse the file by hand, I think I'll do that.