in reply to Re: Text::CSV_XS and "binary" mode
in thread Text::CSV_XS and "binary" mode

Hey there, i tried calling binmode INFILE but it seems to have made no difference. An example record is this (printed out nicely):
LOGID: 123 LOGDATE: 04-Dec-2006 EMPID: 23 CATEGORY: Software SUBCAT: OS: DESCR: he needs russian fonts ACTION: installed cyrillic support needs additional fonts ASSIGNTO: 1 STATUS: C

Replies are listed 'Best First'.
Re^3: Text::CSV_XS and "binary" mode
by moritz (Cardinal) on Jan 26, 2009 at 08:43 UTC
    That still doesn't tell me what you expected, what you get, and how these two differ. You might want to use Data::Dumper (set $Data::Dumper::Useqq = 1) to get an accurate description of your string.
      The line itself is returned correctly. The problem is that if for example the Action field contains a comma, then the what i get in the next field is the string after the comma and not the correct assign to ID, which is then place in the STATUS field instead, leaving me with an additional field in the end. Is this clearer? :/
        So basically you want the separator to be something different than a comma? Use the sep_char option then. The Text::CSV_XS documentation is quite clear on that, I think...
Re^3: Text::CSV_XS and "binary" mode
by Marshall (Canon) on Jan 26, 2009 at 09:06 UTC
    I see a number of problems in the posted code. But you are able to get some kind of printout until the end-of-line. I suspect a problem there. You should NOT use bin mode! This can be big trouble! A .CSV file is ASCII, not binary.

    I suspect that you have kind of inter-change problem between Windows and Unix. On Windows end-of-line is "\r\n", on Unix this is just "\n".

    Normally Perl will do "the right thing" for this translation, eg, it doesn't matter all all. Using Bin mode can defeat this "smarts". That is a different thing than how you move files between systems, but most of these things are pretty smart too and I use a number of them. For an ASCII file you shouldn't use bin mode for the transfer.

    post just a couple lines of your file it you can. It is hard for me to understand what you are trying to do from what I've seen so far.

    Edit: Looking more at the Text::CSV_XS module and it appears that I am wrong above. This module does have some trouble with \n. I have used this before but only in conjunction with DBI and SQL modules that evidently don't have this problem. Anyway post a couple lines of the CSV db (don't use "real" data that would cause problems), just an example.

      Hey there, the csv file was from an oracle db export of a specific table. The following are 2 lines
      456,05-Dec-2002,80,Software,print,,he can't print hes getting error ms +g: 'LPTTS FOR EC-2-1,paper jam,1,C 457,05-Dec-2002,22,Software,switchb.,,when internal call to ext 444 it + goes to switchboard2 - when internal call to 0 it goes to switchboar +d1 -- both should go to switchboard 1,call texchnitian - fixed ,35,C
      Using bin mode in CSV files was what i saw in the documentation for solving problems with commas inside fields with commas. Once a comma comes up in the field it all breaks. Is it easier to export a CSV file using a different separation character? Would this solve any carriage return problems? If so what character would you recommend using?

        The default separation character is a comma, the default quotation character is a double quote (").

        If the quotation character appears inside a field, it ought to be escaped with the escape character, which by default is also a double quote.

        Separation characters can only appear inside a field if the complete field is quoted. In that case, the separation character should not be escaped.

        For the two lines of CSV that you posted, I see no problem at all for the default values when using binary => 1 (except of course for the funny typo).


        Enjoy, Have FUN! H.Merijn
        Ok, below is a idea to get you started, I screwed up the columns, but below is the idea...:

        For parsing comma separated values, I think that Text::ParseWords could serve you well? Instead of split(/,/,$_) below, you might need parse_csv($_)? parse_csv should produce a list that you can use like I did below (assign multiple values on the left of the "=" sign).

        Your question about a different export character is insightful and very smart! If you have control over that, then get the report with say ^ delimited fields instead of ", then all these problems about quotes within quotes, etc. just go away! Just split on /^/ instead of /"//,/! I think that is the most simple and best idea yet! If you can get that, then just follow the example below (of course without my mistakes!-sorry I messed the row order up somehow..but you will figure it out...)

        #!/usr/bin/perl -w use strict; while (<DATA>) { my ($num,$date,$category,$os,$subcat,$action,$desc,$assignto, $status,$extra) = split(/,/,$_); $num ||= ""; print "NUM: $num\n"; $date ||= ""; print "DATE: $date\n"; $category ||= ""; print "CATEGORY: $category\n"; $subcat ||= ""; print "SUBCAT: $subcat\n"; $os ||= ""; print "OS: $os\n"; $desc ||= ""; print "DESCR: $desc\n"; $action ||= ""; print "ACTION: $action\n"; $assignto ||= ""; print "ASSIGNTO: $assignto\n"; $status ||= ""; print "STATUS: $status\n"; $extra ||= ""; print "Extra: $extra\n"; print "\n"; } __DATA__ 456,05-Dec-2002,80,Software,print,,he can't print hes getting error ms +g: 'LPTTS FOR EC-2-1,paper jam,1,C 457,05-Dec-2002,22,Software,switchb.,,when internal call to ext 444 it + goes to switchboard2 - when internal call to 0 it goes to switchboar +d1 -- both should go to switchboard 1,call texchnitian - fixed ,35,C Prints: NUM: 456 DATE: 05-Dec-2002 CATEGORY: 80 SUBCAT: print OS: Software DESCR: he can't print hes getting error msg: 'LPTTS FOR EC-2-1 ACTION: ASSIGNTO: paper jam STATUS: 1 Extra: C NUM: 457 DATE: 05-Dec-2002 CATEGORY: 22 SUBCAT: switchb. OS: Software DESCR: when internal call to ext 444 it goes to switchboard2 - when + internal call to 0 it goes to switchboard1 -- both should go to swit +chboard 1 ACTION: ASSIGNTO: call texchnitian - fixed STATUS: 35 Extra: C