in reply to splitting cvs file without line breaks

If I weren't concerned about a comma appearing inside a quoted field, I might do this:

my $columns = 6; # or 10 or 14 local $/ = ','; my @fields = (); while (<>) { push @fields, $_; if ( scalar @fields == $columns ) { print @fields; @fields = (); } } print @fields;

If I do think that a field might have a comma in it, I think I'd try to use Text::CSV.

use Text::CSV; my $wholefile = do { local $/; <> }; my $parser = Text::CSV->new(); my $status = $parser->parse( $wholefile ); my @fields = $parser->fields(); my $columns = 6; # or 10 or 14 while ( @fields ) { my @line_fields = splice @fields, 0, $columns; my $csv = Text::CSV->new(); my $status = $csv->combine( @line_fields ); print $csv->string(); }

I'm not really all that familiar with Text::CSV, so I wrote that based on the synopsis. Take it with a grain of salt.

Update: Since writing this, I notice that shmem saw something I missed, namely that the last column of the original file naturally does not have a comma between it and the first column of the original file. Neither of my solutions work in that case. The file would have to be repaired first, as suggested.

Replies are listed 'Best First'.
Re^2: splitting cvs file without line breaks
by rendier (Initiate) on May 15, 2007 at 18:56 UTC
    thanks all for your input. I was thinking of using a regex based on matching x comma separated fields, and cutting it of the rest of the line, so that I would not have to hard code it based on a date or any other type of content.

    something like

    $file =~ m/((\".*\"|\d+|[\d :]+),){6} (.*)/; $line = $1; $restoffile = $2;

    I'll see if this works, otherwise I'll punch up one of these suggestions.

    rendier