>perl -e"print qq{\"a\nb\",c\nd,e\n}" | perl -MText::CSV -le"print joi
+n '|', @{ Text::CSV->new({binary=>1})->getline(*STDIN) }"
a
b|c
| [reply] [d/l] |
Grandfather,Text::CSV has grown a lot of configuration settings (from the docs): verbatim
This is a quite controversial attribute to set, but it makes hard things possible.
The basic thought behind this is to tell the parser that the normally special characters newline (NL) and Carriage Return (CR) will not be special when this flag is set, and be dealt with as being ordinary binary characters. This will ease working with data with embedded newlines.
When verbatim is used with getline (), getline auto-chomp's every line.
Imagine a file format like M^^Hans^Janssen^Klas 2\n2A^Ja^11-06-2007#\r\n
where, the line ending is a very specific "#\r\n", and the sep_char is a ^ (caret). None of the fields is quoted, but embedded binary data is likely to be present. With the specific line ending, that shouldn't be too hard to detect.
CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James
| [reply] [d/l] [select] |
The thinking behind using an IO handle rather than a string was that I could use the Text::CSV::getline_hr() method to do some post processing of the uploaded data and store the results in a separate data structure.
However, my immediate requirement is to get the data into a spreadsheet, so Text::CSV_XS looks like a better option. It's man page has some useful advice on how to handle the embedded newlines.
Thanks to everyone who replied.
| [reply] |