Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

Re: Recommendations for parsing invalid CSV

by Corion (Patriarch)
on Apr 21, 2008 at 14:06 UTC ( #681921=note: print w/replies, xml ) Need Help??

in reply to Recommendations for parsing invalid CSV

I'm not aware of any tool that does that, but it isn't hard to find the problematic lines, as these will have more delimiters in them than the number of columns. All other lines are trivially decodable and pose no problem, as the quotes are unnecessary there.

For the problematic lines, I would parse the complete line and aim for ," to start a quoted string and ", to end said string. I would also log all these problematic lines for later human inspection. You could also consider to ask a human about these problematic lines instead of spilling one interpretation into the DB.

Replies are listed 'Best First'.
Re^2: Recommendations for parsing invalid CSV
by markjugg (Curate) on Apr 21, 2008 at 14:34 UTC

    I think the suggestion to focus on the "problem lines" is a good one. If I have to pre-parse the file by hand, I think I'll do that.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://681921]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (1)
As of 2023-06-08 04:30 GMT
Find Nodes?
    Voting Booth?
    How often do you go to conferences?

    Results (29 votes). Check out past polls.