in reply to csv output

Please do not go with the existing wrong answers. Here is a description of the basic CSV spec as implemented in most Microsoft products:
  1. Rows are delimited by returns. (\r\n or \n depending on the platform, binmode, etc.)
  2. Fields within a row are delimited by ",". (When saving in "text" format the separator is often "\t" instead.)
  3. Fields may be quoted or unquoted.
  4. Quoted fields are literal text that start and end with an unpaired ". Separators, returns, etc can appear within a quoted field, and " can appear doubled.
  5. Unquoted fields cannot contain the separator, returns, or quotation marks. They are also subject to some interpretation. For instance numbers may appear in floating point, and an empty field is a null (represented within Perl by undef - very few parsers get this right).
  6. It is customary for the first row to be the field names, and for all rows to have the same number of fields.
With that in mind, here is a snippet to format a row:
# Takes an array and returns it as a CSV row sub format_csv { my @fields = @_; foreach (@fields) { if (not defined($_)) { $_ = ""; } elsif (0 == length($_)) { $_ = '""'; } elsif (/\s|"|'|,/) { s/"/""/g; $_ = qq("$_"); } } (join ",", @fields) . "\n"; }
With that function, supposing that $file was a file you wanted to write, @cols an array of columns that you wanted to put in a CSV file, and @data was an array of hash references with your data (see References Quick Reference if you don't know what an array of hash references is), you could write it as follows:
local *FILE; open (FILE, "> $file") or die "Cannot write '$file': $!"; print FILE format_csv(@cols); foreach my $row (@data) { print FILE format_csv(@$row{@cols}); } close(FILE) or die "Cannot close '$file': $!";
Note that I have put in error checking as very wisely recommended in perlstyle...

Replies are listed 'Best First'.
Re: Re (tilly) 1: csv output
by Juerd (Abbot) on Mar 14, 2002 at 19:25 UTC

    and " can appear doubled.

    Ouch, that hurts. Mastering Regular Expressions assumes " can be escaped with a backslash.

    The regex the book uses is:

    "([^"\\]*(\\.[^"\\]*)*)",?|([^,]+),?|,
    Would that introduce Microsoft-incompatability?

    U28geW91IGNhbiBhbGwgcm90MTMgY
    W5kIHBhY2soKS4gQnV0IGRvIHlvdS
    ByZWNvZ25pc2UgQmFzZTY0IHdoZW4
    geW91IHNlZSBpdD8gIC0tIEp1ZXJk
    

      Save the following in your text editor and open in Excel to verify the incompatibility.
      hello,world "this","is "",test 1" "this","is \",test 2" "this","is "" test 3" "this","is \" test 4"
      Note in particular the unusual handling of the even tests.

        That would require me to install Microsoft Windows and acquire and install Microsoft Excel. Could you please tell me how they will be handled?

        U28geW91IGNhbiBhbGwgcm90MTMgY
        W5kIHBhY2soKS4gQnV0IGRvIHlvdS
        ByZWNvZ25pc2UgQmFzZTY0IHdoZW4
        geW91IHNlZSBpdD8gIC0tIEp1ZXJk