harishnuti has asked for the wisdom of the Perl Monks concerning the following question:


Hello Monks
As subject tells , i needed some workaround for the issue iam facing in creation of CSV file.
iam not using TEXT::CSV_XS module due to issues and other factors in our environments, currently iam creating CSV file manually as below.
#!/usr/bin/perl use strict; use warnings; open my $fh,">","tmp.csv" or die "Unable to open $! \n"; #read the file line by line and delimt with comma while(<DATA>){ my @array = split /~/; my $str = join(",",@array); # Split seperated by comma print $fh "$str"; } close $fh; __DATA__ col1~col2~col3~col4~col5 data11~data12~data13~data14~data15 data21~data22~data23~data24~data25 data31~data32~data33~data34~data35 data,data41~data42~data43~data44~data45 data51,data52,data,junk,specialchar,sometingdata53~data54~data55

The data will be seperated by ~ sysmbol, iam going to split it and join it by comma(which is equivalent to replacing ~ with comma).
i have issues when there is a comma in one of the column data as shown in above sample.
i needed your advice on how do i overcome this for timebeing, we are considering to upgrade to TEXT:CSV_XS module sooner.
iam thinking of enclosing each value of array into double quotes to preserve original comma before writing to CSV file, how can i achieve that?
if at all i have enclose the data in double quotes, is it worth to enclose all values into double quotes or only value having special characters to be enclosed?
in either case i need monks advise on how can i proceed further till i implement CPAN module.

Replies are listed 'Best First'.
Re: Workaround for my CSV file
by GrandFather (Saint) on Dec 05, 2008 at 07:34 UTC

    Assuming that you really can't use a module (but see Yes, even you can use CPAN), a simple regex substitution does the trick:

    use strict; use warnings; open my $fh, ">", "tmp.csv" or die "Unable to open $! \n"; #read the file line by line and delimit with commas print join ",", map {s/(.*,.*)/"$1"/; $_} split /~/ while <DATA>; close $fh; __DATA__ col1~col2~col3~col4~col5 data11~data12~data13~data14~data15 data21~data22~data23~data24~data25 data31~data32~data33~data34~data35 data,data41~data42~data43~data44~data45 data51,data52,data,junk,specialchar,sometingdata53~data54~data55

    Prints:

    col1,col2,col3,col4,col5 data11,data12,data13,data14,data15 data21,data22,data23,data24,data25 data31,data32,data33,data34,data35 "data,data41",data42,data43,data44,data45 "data51,data52,data,junk,specialchar,sometingdata53",data54,data55

    Perl's payment curve coincides with its learning curve.

      ...and, if you want to allow '"' in the fields you can extend that to:

      use strict ; use warnings ; print join ",", map { if (m/[,"]/) { s/(\A|"|\n|(?<!\n)\Z)/"$1/g } ; $ +_ } split /~/ while <DATA>; __DATA__ col1~col2~col3~col4~col5 data11~data12~data13~data14~da,data15 data21~"data22"~d"ata"23~data24~da"ta"25 data31~data32~data33~data34~"data35" data,data41~data42~data43~data44~data45 data51,data52,data,junk,specialchar,sometingdata53~data54~data55
      which gives:
      col1,col2,col3,col4,col5
      data11,data12,data13,data14,"da,data15"
      data21,"""data22""","d""ata""23",data24,"da""ta""25"
      data31,data32,data33,data34,"""data35"""
      "data,data41",data42,data43,data44,data45
      "data51,data52,data,junk,specialchar,sometingdata53",data54,data55
      
      (the last item on each line has the line ending attached to it... so have to box a little clever to get the trailing '"' right in all cases. GrandFather used (.*), which won't match a line ending, unless you tell it to.)

      Or you can just stick '"' around every item:

      print join '",', map { s/(\A|"|\n)/"$1/g ; $_ } split /~/ while <DATA> +;
      which is slightly more straightforward:
      "col1","col2","col3","col4","col5"
      "data11","data12","data13","data14","da,data15"
      "data21","""data22""","d""ata""23","data24","da""ta""25"
      "data31","data32","data33","data34","""data35"""
      "data,data41","data42","data43","data44","data45"
      "data51,data52,data,junk,specialchar,sometingdata53","data54","data55"
      

      One assumes that you don't expect your '~' item separators to appear in any item, not even within '"' or any other escaping mechanism... The deeper you go into this kind of thing, the more you find how useful stuff on CPAN is !


      Thank you Mr Grandfather, that was a good regular expression and solves current issues.. and indeed good learning for me. appreciate your time
Re: Workaround for my CSV file
by Anonymous Monk on Dec 05, 2008 at 07:22 UTC
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Workaround for my CSV file
by moritz (Cardinal) on Dec 05, 2008 at 07:39 UTC
    If you can't use a module, you can at least look at how the standard modules solve your problem, and mimic that in your own code.
Re: Workaround for my CSV file
by brsaravan (Scribe) on Dec 05, 2008 at 08:37 UTC
    Here is the simple regex logic without spliting the input
    #! /usr/bin/perl use strict; use warnings; open my $fh,">","tmp.csv" or die "Unable to open $! \n"; while(<DATA>){ $_ =~ s/(\S+,)+([^~]*)/"$1$2"/g if $_ =~ /,/; $_ =~ s/~/,/g; print $fh "$_"; } close $fh; __DATA__ col1~col2~col3~col4~col5 data11~data12~data13~data14~data15 data21~data22~data23~data24~data25 data31~data32~data33~data34~data35 data,data41~data42~data43~data44~data45 data51,data52,data,junk,specialchar,sometingdata53~data54~data55