in reply to Re^3: Text::CSV_XS and "binary" mode
in thread Text::CSV_XS and "binary" mode

Hey there, the csv file was from an oracle db export of a specific table. The following are 2 lines
456,05-Dec-2002,80,Software,print,,he can't print hes getting error ms +g: 'LPTTS FOR EC-2-1,paper jam,1,C 457,05-Dec-2002,22,Software,switchb.,,when internal call to ext 444 it + goes to switchboard2 - when internal call to 0 it goes to switchboar +d1 -- both should go to switchboard 1,call texchnitian - fixed ,35,C
Using bin mode in CSV files was what i saw in the documentation for solving problems with commas inside fields with commas. Once a comma comes up in the field it all breaks. Is it easier to export a CSV file using a different separation character? Would this solve any carriage return problems? If so what character would you recommend using?

Replies are listed 'Best First'.
Re^5: Text::CSV_XS and "binary" mode
by Tux (Canon) on Jan 26, 2009 at 11:11 UTC

    The default separation character is a comma, the default quotation character is a double quote (").

    If the quotation character appears inside a field, it ought to be escaped with the escape character, which by default is also a double quote.

    Separation characters can only appear inside a field if the complete field is quoted. In that case, the separation character should not be escaped.

    For the two lines of CSV that you posted, I see no problem at all for the default values when using binary => 1 (except of course for the funny typo).


    Enjoy, Have FUN! H.Merijn
      Ah i see, in other words if a field is to include a comma or carriage return that field should be double quoted, i will have a look at the export options and see if i can have some of the fields quoted

        BINGO! :)


        Enjoy, Have FUN! H.Merijn
Re^5: Text::CSV_XS and "binary" mode
by Marshall (Canon) on Jan 26, 2009 at 11:35 UTC
    Ok, below is a idea to get you started, I screwed up the columns, but below is the idea...:

    For parsing comma separated values, I think that Text::ParseWords could serve you well? Instead of split(/,/,$_) below, you might need parse_csv($_)? parse_csv should produce a list that you can use like I did below (assign multiple values on the left of the "=" sign).

    Your question about a different export character is insightful and very smart! If you have control over that, then get the report with say ^ delimited fields instead of ", then all these problems about quotes within quotes, etc. just go away! Just split on /^/ instead of /"//,/! I think that is the most simple and best idea yet! If you can get that, then just follow the example below (of course without my mistakes!-sorry I messed the row order up somehow..but you will figure it out...)

    #!/usr/bin/perl -w use strict; while (<DATA>) { my ($num,$date,$category,$os,$subcat,$action,$desc,$assignto, $status,$extra) = split(/,/,$_); $num ||= ""; print "NUM: $num\n"; $date ||= ""; print "DATE: $date\n"; $category ||= ""; print "CATEGORY: $category\n"; $subcat ||= ""; print "SUBCAT: $subcat\n"; $os ||= ""; print "OS: $os\n"; $desc ||= ""; print "DESCR: $desc\n"; $action ||= ""; print "ACTION: $action\n"; $assignto ||= ""; print "ASSIGNTO: $assignto\n"; $status ||= ""; print "STATUS: $status\n"; $extra ||= ""; print "Extra: $extra\n"; print "\n"; } __DATA__ 456,05-Dec-2002,80,Software,print,,he can't print hes getting error ms +g: 'LPTTS FOR EC-2-1,paper jam,1,C 457,05-Dec-2002,22,Software,switchb.,,when internal call to ext 444 it + goes to switchboard2 - when internal call to 0 it goes to switchboar +d1 -- both should go to switchboard 1,call texchnitian - fixed ,35,C Prints: NUM: 456 DATE: 05-Dec-2002 CATEGORY: 80 SUBCAT: print OS: Software DESCR: he can't print hes getting error msg: 'LPTTS FOR EC-2-1 ACTION: ASSIGNTO: paper jam STATUS: 1 Extra: C NUM: 457 DATE: 05-Dec-2002 CATEGORY: 22 SUBCAT: switchb. OS: Software DESCR: when internal call to ext 444 it goes to switchboard2 - when + internal call to 0 it goes to switchboard1 -- both should go to swit +chboard 1 ACTION: ASSIGNTO: call texchnitian - fixed STATUS: 35 Extra: C

      Please don't promote the use of <INFILE> when dealing with embedded newlines and such. The OP correctly uses Text::CSV_XS to deal with those. split () is NOT an option here. You code will break on the first newline that is part of a field.


      Enjoy, Have FUN! H.Merijn
        From problem statement, the 2 lines given don't have embedded new lines. Didn't realize that was part of the requirement. As you point out, that certainly changes things if that is true!