barakuda has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I have a problem... There is some data comming in a .CSV file. There are 10 columns and the values in the 10th column may or may not contain commas! So CSV formatting becomes messed up with that 10th column (i.e. some values jusp to 11th, 12th, etc. columns)
. So, what I was thinking is to use regular expressions to match to the last column, then substitute commas with semicolumns, and finally rewrite the data. What I have now (and what doesn't work) is:

open (OUT, ">file.csv") foreach my $line (@out){ $line =~ m/(.*?,){10}/; my $tmp = $1; chomp ($tmp); $tmp =~ s/,/;/; ... WHAT HERE? ...} close (OUT);

I don't know where to go from here. Efficiency is also a concern since the file has 7500 lines and operations with strings like that may take a consederable amount of time.

Any suggestions?

Replies are listed 'Best First'.
Re: CSV problem
by Tux (Canon) on Mar 07, 2008 at 16:39 UTC

    parsing real CSV data is an art, best not done with regular expressions. In your case, if it is that simple, use split:

    my @f10 = split m/,/ => $line, 10;

    The last field now gobbles up the rest of the line. When you'd use a real parser, it might be something like:

    my $csv = Text::CSV_XS->new ({ binary => 1 }); foreach my $line (@out) { my @f = $csv->parse ($line); @f > 10 and $f[9] = join ",", splice @f, 9, $#f;

    But above approach will loose quotation


    Enjoy, Have FUN! H.Merijn
Re: CSV problem
by moritz (Cardinal) on Mar 07, 2008 at 16:53 UTC
    Another attempt...

    CSV is a fairly loose term, everybody thinks differently about it.

    Most CSV dialekts use quotes to mark fields that may contain commas. But you always have to check that the tools that read the data actually do it in the same way.

    my @records = split m/,/, $line, 10; $records[-1] = qq["$records[-1]"]; print OUT join(',', @records), "\n";
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: CSV problem
by Cristoforo (Curate) on Mar 07, 2008 at 16:36 UTC
    Several modules could help you here - DBD::CSV or Text::CSV.
Re: CSV problem
by moritz (Cardinal) on Mar 07, 2008 at 16:33 UTC

    Update: forget the rest of the post, misread the question. Thanks cbu

    You can use split's third argument to limit the number of results to 10.

    Look at that short example:

    perl -MData::Dumper -wle '$_="a,b,c,,,d,e"; print Dumper [split m/,/, +$_, 3]' $VAR1 = [ 'a', 'b', 'c,,,d,e' ];

    Update: you should use while to iterate over your file, because your solution will first slurp all of the file into memory, which is slow.

    open (OUT, ">", "file.csv" or die "Can't read file.csv: $!"; while (my $line = <OUT>){ chomp $line; my @records = split m/,/, $line, 10; # do your work here }

      You've misread the direction. He read's from an array @out and write's to file OUT. Your example tries to read from a file handle opened for output


      Enjoy, Have FUN! H.Merijn