I wonder how much code would break if all of what RFC 4180bis proposes would be blindly implemented by parsers.
I bet there are tons of CSV files out there that are not UTF-8 and/or do not follow BOM correctly or are real binary to start with.
And what about point 8:
A hash sign MAY be used to mark lines that are meant to be commented lines. A commented line can contain any whitespace or visible character until it is terminated by a line break (CR, LF or CRLF). A comment line MAY appear in any line of the file (before or after an OPTIONAL header) but MUST NOT be mistaken with a subsequent line of a multi-line field. Subsequent lines of multi-line fields can start with a hash sign and MUST NOT interpreted as comments. For example: #commentCRLF aaa,bbb,cccCRLF #comment 2CRLF "aaa","this is CRLF
This would require new options in all parsers to reject lines that start with a #.
As Text::CSV_XS already implements/supports all other options, I wonder if there would be enough motivation to add attributes to recognize/skip comments (which would also require a new config variable that contains the comment lead-in (#, // sprint to mind) and if a leading comment string would only be valid if followed by whitespace (probably more things to consider). This would also mean impact on strict as comments (and empty lines) will, per definition, nog have the same number of fields as the rest of the data.
Ideas welcome as usual.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: New RFC for CSV in the pipeline
by Tux (Canon) on Mar 20, 2021 at 16:35 UTC |