in reply to Re^2: CSV data processing
in thread CSV data processing

Indeed, that is so with later versions of Text::CSV (although not with earlier versions). But Text::CSV doesn't solve the OP's problem which is that some of the fields straddle lines. Text::xSV handles that case (so long as the field is suitably quoted), Text::CSV does not.


Perl's payment curve coincides with its learning curve.

Replies are listed 'Best First'.
Re^4: CSV data processing
by ikegami (Patriarch) on Dec 10, 2008 at 00:43 UTC

    It does for me.

    >perl -e"print qq{\"a\nb\",c\nd,e\n}" | perl -MText::CSV -le"print joi +n '|', @{ Text::CSV->new({binary=>1})->getline(*STDIN) }" a b|c
Re^4: CSV data processing
by CountZero (Bishop) on Dec 10, 2008 at 06:53 UTC
    Grandfather,

    Text::CSV has grown a lot of configuration settings (from the docs):

    verbatim

    This is a quite controversial attribute to set, but it makes hard things possible.

    The basic thought behind this is to tell the parser that the normally special characters newline (NL) and Carriage Return (CR) will not be special when this flag is set, and be dealt with as being ordinary binary characters. This will ease working with data with embedded newlines.

    When verbatim is used with getline (), getline auto-chomp's every line.

    Imagine a file format like

    M^^Hans^Janssen^Klas 2\n2A^Ja^11-06-2007#\r\n

    where, the line ending is a very specific "#\r\n", and the sep_char is a ^ (caret). None of the fields is quoted, but embedded binary data is likely to be present. With the specific line ending, that shouldn't be too hard to detect.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re^4: CSV data processing
by normskib (Initiate) on Dec 10, 2008 at 09:58 UTC

    The thinking behind using an IO handle rather than a string was that I could use the Text::CSV::getline_hr() method to do some post processing of the uploaded data and store the results in a separate data structure.

    However, my immediate requirement is to get the data into a spreadsheet, so Text::CSV_XS looks like a better option. It's man page has some useful advice on how to handle the embedded newlines.

    Thanks to everyone who replied.