in reply to CSV data processing

Your question doesn't make sense. Text::CSV parses lines (which is weakness of the module), not files.

Maybe what you want is Text::xSV. Consider:

use strict; use warnings; use Text::xSV; my $str = <<FILE; a,2,"3 a",4,5,6 b,2,3,4,5,6 FILE open my $inFile, '<', \$str; my $xsv = Text::xSV->new (fh => $inFile); $xsv->bind_fields (qw(1 2 3 4 5 6)); while (my @parts = $xsv->get_row ()) { s!\n!!g for @parts; print "@parts\n"; } close $inFile;

Prints:

a 2 3a 4 5 6 b 2 3 4 5 6

Perl's payment curve coincides with its learning curve.

Replies are listed 'Best First'.
Re^2: CSV data processing
by CountZero (Bishop) on Dec 09, 2008 at 23:20 UTC
    Grandfather,

    Strange as it may be, Text::CSV allows you to do

    $colref = $csv->getline ($io); # Read a line from file $io, # parse it and return an array # ref of fields
    It is like an iterator over the file on a line-by-line basis.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      Indeed, that is so with later versions of Text::CSV (although not with earlier versions). But Text::CSV doesn't solve the OP's problem which is that some of the fields straddle lines. Text::xSV handles that case (so long as the field is suitably quoted), Text::CSV does not.


      Perl's payment curve coincides with its learning curve.

        It does for me.

        >perl -e"print qq{\"a\nb\",c\nd,e\n}" | perl -MText::CSV -le"print joi +n '|', @{ Text::CSV->new({binary=>1})->getline(*STDIN) }" a b|c
        Grandfather,

        Text::CSV has grown a lot of configuration settings (from the docs):

        verbatim

        This is a quite controversial attribute to set, but it makes hard things possible.

        The basic thought behind this is to tell the parser that the normally special characters newline (NL) and Carriage Return (CR) will not be special when this flag is set, and be dealt with as being ordinary binary characters. This will ease working with data with embedded newlines.

        When verbatim is used with getline (), getline auto-chomp's every line.

        Imagine a file format like

        M^^Hans^Janssen^Klas 2\n2A^Ja^11-06-2007#\r\n

        where, the line ending is a very specific "#\r\n", and the sep_char is a ^ (caret). None of the fields is quoted, but embedded binary data is likely to be present. With the specific line ending, that shouldn't be too hard to detect.

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

        The thinking behind using an IO handle rather than a string was that I could use the Text::CSV::getline_hr() method to do some post processing of the uploaded data and store the results in a separate data structure.

        However, my immediate requirement is to get the data into a spreadsheet, so Text::CSV_XS looks like a better option. It's man page has some useful advice on how to handle the embedded newlines.

        Thanks to everyone who replied.