in reply to Re^2: Text::CSV_XS and "binary" mode
in thread Text::CSV_XS and "binary" mode

I see a number of problems in the posted code. But you are able to get some kind of printout until the end-of-line. I suspect a problem there. You should NOT use bin mode! This can be big trouble! A .CSV file is ASCII, not binary.

I suspect that you have kind of inter-change problem between Windows and Unix. On Windows end-of-line is "\r\n", on Unix this is just "\n".

Normally Perl will do "the right thing" for this translation, eg, it doesn't matter all all. Using Bin mode can defeat this "smarts". That is a different thing than how you move files between systems, but most of these things are pretty smart too and I use a number of them. For an ASCII file you shouldn't use bin mode for the transfer.

post just a couple lines of your file it you can. It is hard for me to understand what you are trying to do from what I've seen so far.

Edit: Looking more at the Text::CSV_XS module and it appears that I am wrong above. This module does have some trouble with \n. I have used this before but only in conjunction with DBI and SQL modules that evidently don't have this problem. Anyway post a couple lines of the CSV db (don't use "real" data that would cause problems), just an example.

Replies are listed 'Best First'.
Re^4: Text::CSV_XS and "binary" mode
by bittis (Sexton) on Jan 26, 2009 at 09:29 UTC
    Hey there, the csv file was from an oracle db export of a specific table. The following are 2 lines
    456,05-Dec-2002,80,Software,print,,he can't print hes getting error ms +g: 'LPTTS FOR EC-2-1,paper jam,1,C 457,05-Dec-2002,22,Software,switchb.,,when internal call to ext 444 it + goes to switchboard2 - when internal call to 0 it goes to switchboar +d1 -- both should go to switchboard 1,call texchnitian - fixed ,35,C
    Using bin mode in CSV files was what i saw in the documentation for solving problems with commas inside fields with commas. Once a comma comes up in the field it all breaks. Is it easier to export a CSV file using a different separation character? Would this solve any carriage return problems? If so what character would you recommend using?

      The default separation character is a comma, the default quotation character is a double quote (").

      If the quotation character appears inside a field, it ought to be escaped with the escape character, which by default is also a double quote.

      Separation characters can only appear inside a field if the complete field is quoted. In that case, the separation character should not be escaped.

      For the two lines of CSV that you posted, I see no problem at all for the default values when using binary => 1 (except of course for the funny typo).


      Enjoy, Have FUN! H.Merijn
        Ah i see, in other words if a field is to include a comma or carriage return that field should be double quoted, i will have a look at the export options and see if i can have some of the fields quoted
      Ok, below is a idea to get you started, I screwed up the columns, but below is the idea...:

      For parsing comma separated values, I think that Text::ParseWords could serve you well? Instead of split(/,/,$_) below, you might need parse_csv($_)? parse_csv should produce a list that you can use like I did below (assign multiple values on the left of the "=" sign).

      Your question about a different export character is insightful and very smart! If you have control over that, then get the report with say ^ delimited fields instead of ", then all these problems about quotes within quotes, etc. just go away! Just split on /^/ instead of /"//,/! I think that is the most simple and best idea yet! If you can get that, then just follow the example below (of course without my mistakes!-sorry I messed the row order up somehow..but you will figure it out...)

      #!/usr/bin/perl -w use strict; while (<DATA>) { my ($num,$date,$category,$os,$subcat,$action,$desc,$assignto, $status,$extra) = split(/,/,$_); $num ||= ""; print "NUM: $num\n"; $date ||= ""; print "DATE: $date\n"; $category ||= ""; print "CATEGORY: $category\n"; $subcat ||= ""; print "SUBCAT: $subcat\n"; $os ||= ""; print "OS: $os\n"; $desc ||= ""; print "DESCR: $desc\n"; $action ||= ""; print "ACTION: $action\n"; $assignto ||= ""; print "ASSIGNTO: $assignto\n"; $status ||= ""; print "STATUS: $status\n"; $extra ||= ""; print "Extra: $extra\n"; print "\n"; } __DATA__ 456,05-Dec-2002,80,Software,print,,he can't print hes getting error ms +g: 'LPTTS FOR EC-2-1,paper jam,1,C 457,05-Dec-2002,22,Software,switchb.,,when internal call to ext 444 it + goes to switchboard2 - when internal call to 0 it goes to switchboar +d1 -- both should go to switchboard 1,call texchnitian - fixed ,35,C Prints: NUM: 456 DATE: 05-Dec-2002 CATEGORY: 80 SUBCAT: print OS: Software DESCR: he can't print hes getting error msg: 'LPTTS FOR EC-2-1 ACTION: ASSIGNTO: paper jam STATUS: 1 Extra: C NUM: 457 DATE: 05-Dec-2002 CATEGORY: 22 SUBCAT: switchb. OS: Software DESCR: when internal call to ext 444 it goes to switchboard2 - when + internal call to 0 it goes to switchboard1 -- both should go to swit +chboard 1 ACTION: ASSIGNTO: call texchnitian - fixed STATUS: 35 Extra: C

        Please don't promote the use of <INFILE> when dealing with embedded newlines and such. The OP correctly uses Text::CSV_XS to deal with those. split () is NOT an option here. You code will break on the first newline that is part of a field.


        Enjoy, Have FUN! H.Merijn