in reply to Regex with malformed CSV files

Newlines and commas embedded in double quotes are not a problem, Text::CSV_XS (with binary=>1) and Text::xSV will handle them fine. As for unescaped quotes inside quotes, I can't imagine there's a way to automate that. Personally, I'd parse the file with Text::CSV_XS and check for parse sucess, writing successfull parses to one file and unsuccessful ones to another and then escape the quotes in the bad file by hand :-(.

Replies are listed 'Best First'.
Re^2: Regex with malformed CSV files
by ickyb0d (Monk) on Jan 10, 2006 at 20:14 UTC
    Thanks for the info! Here's what i ended up doing using Text::CSV_XS

    while(!($csv->parse($line)) || $csv->fields < $COLUMNS) { $line =~ s/\r//g; $line =~ s/\n$/=0D=0A/; $line .= <IN>; }


    So while it's an invalid line (csv->parse) or the number of fields (csv->fields) isn't equal to the number of colums... it just keeps grabbing lines.

    Thanks again!