bittis has asked for the wisdom of the Perl Monks concerning the following question:

Hi everyone, I am tried to parse a CSV file. The problem i 'm having is that certain fields of that file have commas inside them as well as carriage return characters. Now i tried using CSV_XS and setting as follows
$csv = Text::CSV_XS->new({binary => 1});
This does not seem to have any effect and returns broken records. The following is an example of what i did:
$csv = Text::CSV_XS->new({binary => 1}); while (<INFILE>) { #while ($csv->parse($_)) if (/"X"/) { #The trash data has these 3 characters in it print "THERE IS TRASH IN THE FILE:: $_\n"; } elsif ($first_line==1){ $first_line=0; } else { #Now to deal with the data I want to keep if($csv->parse($_)) { #checks to see if data exists in + $_ and +parses it if it does my @fields=$csv->fields; # puts the values from each field in + an +array my $elements=@fields; #gets the number of elements in the +arra +y my $x=0; # for ($x=0;$x<$elements;$x++) { # print "$fields[$x]\t"; print "\n"; if ($fields[$x]=~ m/\d+/i){ print "LOGID: $fields[$x++]\n"; print "LOGDATE: $fields[$x++]\n"; print "EMPID: $fields[$x++]\n"; print "CATEGORY: $fields[$x++]\n"; print "SUBCAT: $fields[$x++]\n"; print "OS: $fields[$x++]\n"; print "DESCR: $fields[$x++]\n"; print "ACTION: $fields[$x++]\n"; print "ASSIGNTO: $fields[$x++]\n"; print "STATUS: $fields[$x++]\n"; } else { print "\problematic field found "; $counterp++; print "LOGID: $fields[$x++]\n"; # $asdf=<>; } # } $counter++; } } }
I am not sure what i am doing wrong but it probably has to do with While <INFILE>, I would appreciate any help i could get

Replies are listed 'Best First'.
Re: Text::CSV_XS and "binary" mode
by Tux (Canon) on Jan 26, 2009 at 07:49 UTC

    Did you consider using getline () instead of <> and parse ()?

    my $csv = Text::CSV-> new ({ binary => 1 }); while (my $row = $csv->getline (*INFILE) { my @fields = @$row; } $csv->eof or $csv->error_diag;

    You can change the loop to catch your trash records.


    Enjoy, Have FUN! H.Merijn
      Hey there, The following didn't work :( I now only get 44 rows of records before it breaks. I also tried setting binmode on the file as suggested. It still breaks when it meets a carriage return character. Where should i be catching my trash records?

        If it breaks because of parsing errors, $csv->error_diag () should give you a clear error message, which could hint you towards using other options. Showing the error message could help others here to give you advice on how to continue.


        Enjoy, Have FUN! H.Merijn
Re: Text::CSV_XS and "binary" mode
by moritz (Cardinal) on Jan 26, 2009 at 07:49 UTC
    Did you also call binmode INFILE?

    What exactly is in your record, and what do you expect to be there?

      Hey there, i tried calling binmode INFILE but it seems to have made no difference. An example record is this (printed out nicely):
      LOGID: 123 LOGDATE: 04-Dec-2006 EMPID: 23 CATEGORY: Software SUBCAT: OS: DESCR: he needs russian fonts ACTION: installed cyrillic support needs additional fonts ASSIGNTO: 1 STATUS: C
        That still doesn't tell me what you expected, what you get, and how these two differ. You might want to use Data::Dumper (set $Data::Dumper::Useqq = 1) to get an accurate description of your string.
        I see a number of problems in the posted code. But you are able to get some kind of printout until the end-of-line. I suspect a problem there. You should NOT use bin mode! This can be big trouble! A .CSV file is ASCII, not binary.

        I suspect that you have kind of inter-change problem between Windows and Unix. On Windows end-of-line is "\r\n", on Unix this is just "\n".

        Normally Perl will do "the right thing" for this translation, eg, it doesn't matter all all. Using Bin mode can defeat this "smarts". That is a different thing than how you move files between systems, but most of these things are pretty smart too and I use a number of them. For an ASCII file you shouldn't use bin mode for the transfer.

        post just a couple lines of your file it you can. It is hard for me to understand what you are trying to do from what I've seen so far.

        Edit: Looking more at the Text::CSV_XS module and it appears that I am wrong above. This module does have some trouble with \n. I have used this before but only in conjunction with DBI and SQL modules that evidently don't have this problem. Anyway post a couple lines of the CSV db (don't use "real" data that would cause problems), just an example.