geofisch has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to export information from an application. It's built-in export function allows for writing a quoted, comma delimited file, but some of the fields are free-form, and contain line breaks (CRLF, since I'm working on WinNT). This means that when I try to read the file with Excel, those records are interpreted as multiple records. I wrote a quick script to strip the line breaks from within the fields, but I'm sure there is a better way to do it. Who wants to show me?
use strict; undef $/; my $infile = (<>); my @fields = $infile =~ /(".*?"|\n)/gms; my $place = "\n"; foreach (@fields) { if ($place ne "\n" && $_ ne "\n") {print ",";}; unless ($_ eq "\n") {s/\n//g;}; print "$_"; $place = $_; }

Replies are listed 'Best First'.
Re: CRLF in a csv
by Anonymous Monk on Apr 04, 2001 at 22:34 UTC
    /(".*?"|\n)/gms;
    That won't work with the text "John said \"Hello\"". Text::ParseWords can handle this.
      You're right, of course. This seems to be better:
      use strict; use Text::ParseWords; undef $/; my $infile = (<>); my @fields = &parse_line("\n", 1, $infile); foreach (@fields) { s/\n/ /; print "$_\n"; }
      It handles "John said \"Hello\"". As a matter of fact, it handles it better than Excel, which creates a field that looks like
      John said \Hello\""
        But, then again... The Text::ParseWords module is certainly powerful, but I can't seem to make it handle errors very well. It does not like
        "John said "Hello"","again"
        So I went back to my earlier method with a better regex
        undef $/; my $infile = (<>); my @fields = $infile =~ /(".*?"(?=,|\n)|\n)/gms; my $place = "\n"; foreach (@fields) { if ($place ne "\n" && $_ ne "\n") {print ",";}; unless ($_ eq "\n") {s/\n//g;}; print "$_"; $place = $_; }
Re (tilly) 1: CRLF in a csv
by tilly (Archbishop) on Apr 05, 2001 at 06:55 UTC
    Are you sure that embedded returns are a problem for Excel?

    Save the following file as a .csv file and try to read in Excel. Based on past experience with Excel and Access I believe that it will work properly, despite the freeform fields:

    name,status,comment Barney,0,"Shot, now dinosaurs are really extinct!" Popeye,1,"If he starts singing, ""I'm Popeye the sailor man..."" look out for the spinach!" "Snow White",9,"Much better looking than the dwarves!"
    Note the embedded double-quotes...
      You, too, are correct. Excel handles embedded returns in .csv files without a problem. The default extension for the exported file is .asc, so I was trying to import it into Excel with that name. If I rename the file first, Excel does a much better job. Thanks.

      Oh well. At least I learned about Text::ParseWords.