f77coder has asked for the wisdom of the Perl Monks concerning the following question:

hello, i'm dealing with a csv file that has some 'bad' lines. how do i handle multiple missing values? multiple commas. my code works for single missing values.

this multiple missing values happens rarely

. i'd like a solution that is still fast at parsing. i'm using Parse::CSV
my $csv = Parse::CSV->new( file => $xFile[$k], sep_char => ',', names => 1, empty_is_undef => 1, auto_diag => 1, binary => 0, header =>'auto' );
11004516,0,0,9,9,3,12477,,,4,,0,,,3,38a947a1,b66b7850,6a14f9b9 11006995,1,,-1,,,,,,,,,,,,fbc55dae,9a89b36c,58e67aaf,f600ec0b,

the error is "Argument "" isn't numeric in numeric eq (==)"

i want so substitute '0' for blanks or missing values i've tried code with various comparisons like
if(length($str_check)==0){return ('0');}else{return($str_check);}
if($str_check = undef){do stuff};{return ('0');}else{return($str_check +);}

any help appreciated.

thanks

Here is my full code . Yes, I admit I am new to serious Perl scripts and this code is not optimal.

use 5.12.0; use warnings; use strict; use Carp; use sigtrap 'handler' => \&myhand, 'INT'; use Cwd; use Benchmark; use File::Basename; use Acme::Comment type => 'C++', own_line => 1; use English '-no_match_vars'; ##################################################################### use Parse::CSV; use Text::CSV_XS; ##################################################################### system('clear'); my $dbg_1=0; my $dbg_2=1; my $start=time; my $t0 = new Benchmark; print "\n Current Date and Time -> " . localtime() . "\n"; my $Base='/Users/Documents/matlab/projects/kaggle/criteo'; my $s_DIR=$Base.'/input/tmp'; my $p_DIR=$Base.'/output/data/pass'; my $f_DIR=$Base.'/output/data/fail'; my @xFile = grep {-f $_}glob( "$s_DIR/x*"); # if($dbg_1){ foreach my $f (@xFile) { my $filesize = -s $f; printf "%-25s size is %15d \n", ($f, $filesize); }; }; # #initialize vars my $k=0; my $noLines=9e3; my $count=0; my $result=0; my $temp=0; my $value=0; my $name=""; my $n=@xFile; for ($k = 0; $k <= $n; $k++){ my $indexF=0; my $indexP=0; (my $suffix,my $path,$name)=fileparse($xFile[$k], "\.[^.]*" ); print 'processing '.$name."\n"; my $f_Pass=$p_DIR."/pass_table_".$name.'.txt'; my $f_Fail=$f_DIR."/fail_table_".$name.'.txt'; open(DATA,">".$f_Pass) || die "Can't open output file"; open(DATA2,">".$f_Fail) || die "Can't open output file"; if($dbg_1){ print "xF=> ".$k."\n"; print "xFile[xF]=> ".$xFile[$k]."\n"; print "name=> ".$name." \n"; print "path=> ".$path." \n"; print "suffix=> ".$suffix." \n"; print "f_Pass=> ".$f_Pass."\n"; print "f_Fail=> ".$f_Fail."\n"; }; my $csv = Parse::CSV->new( file => $xFile[$k], sep_char => ',', names => 1, empty_is_undef => 1, blank_is_undef => 1, auto_diag => 1, binary => 1, header =>'auto', callbacks => { after_parse => sub { $_ ||= 0 for @{$_[1] } },} ); my @hash = $csv->names; #returns hash my @vals = values @hash; #hash to array # for ($count = 0; $count <= $noLines; $count++) { # $value = $csv->fetch; while ( $value = $csv->fetch ){ if($value->{$vals[1]}==1){ for $k (2 .. $#vals) { $temp=$value->{$vals[$k]}; $result=check_blank($temp); process_table($k,$result); }; printf DATA "\n"; $indexP=$indexP+1; }else{ for $k (2 .. $#vals) { $temp=$value->{$vals[$k]}; $result=check_blank($temp); process_table2($k,$result); }; printf DATA2 "\n"; $indexF=$indexF+1; }; }; print " totalP $indexP totalF ".($indexF-0)." total ".($indexP+$inde +xF)." \n"; printf "%% totalP/(totalF+totalP)= %.2f %% \n",($indexP/($indexP+$in +dexF)*100); close(DATA) || die "Couldn't close output file properly"; close(DATA2) || die "Couldn't close output file properly"; }; ######################## sub ######################################### +### sub check_blank{ my $str_check= $_[0]; if((length($str_check)==0)) { return ('0'); }else{ return($str_check); }; exit 1; }; sub process_table{ my $kk= $_[0]; my $result= $_[1]; if($kk==1){ #do something here }else{ printf DATA $result." "; }; return; exit 1; }; sub process_table2{ my $kk= $_[0]; my $result= $_[1]; if($kk==1){ #do something here }else{ printf DATA2 $result." "; }; return; exit 1; }; ########################system ####################################### +# my $t1 = new Benchmark; my $td = timediff($t1, $t0); print "Code took:",timestr($td),"\n"; printf "++Finished program in ->\t %5.2f seconds\n",time-$start; print "\n"; ###################################################################### +#### sub myhand { print "\n caught $SIG{INT}", @_; close(DATA) || die "Couldn't close output file properly"; print "\nHey Stop that SIG hurts!"; print "\nCleaning up now..."; exit 1; }; sub pad (){ my ( $num, $len ) = @_; return '0' x ( $len - length $num ) . $num; exit 1; }; ###################################################################### +##### exit 1;

Replies are listed 'Best First'.
Re: csv parsing with multiple missing values/multiple commas
by Tux (Canon) on Aug 02, 2014 at 08:20 UTC

    The new Text::CSV_XS has callbacks for that:

    my $aoh = csv (in => $xFile[$k], headers => "auto", callbacks => { after_parse => sub { $_ ||= 0 for @{$_[1]} }});

    As an example:

    update: as Parse::CSV uses Text::CSV_XS underneath, you can just add the callback to your code if you have a recent enough version of Text::CSV_XS.

    my $csv = Parse::CSV->new ( file => $xFile[$k], header => "auto", names => 1, empty_is_undef => 1, auto_diag => 1, callbacks => { after_parse => sub { $_ ||= 0 for @{$_[1] } }, ));


    Enjoy, Have FUN! H.Merijn
      Hello, Thanks for your comment. I did have to update my version of Text::CSV_XS but I still get the errors.
        Try this ;
        #!perl use 5.12.0; use warnings; use strict; use File::Basename; use Text::CSV_XS; use Benchmark; my $start = time; my $t0 = new Benchmark; print "\n Current Date and Time -> " . localtime() . "\n"; my $Base = 'c:/temp'; my $s_DIR=$Base.''; my $p_DIR=$Base.''; my $f_DIR=$Base.''; my @xFile = grep {-f $_} glob( "$s_DIR/x*.csv"); for my $csvfile (@xFile){ # input my ($name,$dir,$ext) = fileparse($csvfile, qr/\.[^.]*/ ); print "processing $name \n"; open my $IN,'<',$csvfile or die "Could not open $csvfile : $!"; # outputs my $f_Pass = $p_DIR."/pass_table_".$name.'.txt'; my $f_Fail = $f_DIR."/fail_table_".$name.'.txt'; open my $PASS,'>',$f_Pass or die "Can't open output file $f_Pass : $!"; open my $FAIL,'>',$f_Fail or die "Can't open output file $f_Fail : $!"; # process my $indexF = 0; my $indexP = 0; my $csv = Text::CSV_XS->new({ auto_diag => 1, binary => 1, callbacks => { after_parse => sub { $_ ||= 0 for @{$_[1] } }, } }); my $header = $csv->getline($IN); while ( my $colref = $csv->getline($IN) ){ my $col0 = shift @$colref; my $col1 = shift @$colref; if( $col1 == 1 ){ print $PASS join " ",@$colref,"\n"; $indexP = $indexP + 1; } else { print $FAIL join " ",@$colref,"\n"; $indexF = $indexF + 1; }; }; # report print " totalP $indexP totalF ".($indexF-0)." total ".($indexP+$inde +xF)." \n"; printf "%% totalP/(totalF+totalP) = %.2f %% \n",($indexP/($indexP+$i +ndexF)*100); close $PASS or die "Couldn't close output file $f_Pass : $!"; close $FAIL or die "Couldn't close output file $f_Fail : $!"; }; my $t1 = new Benchmark; my $td = timediff($t1, $t0); printf "\nCode took: %s\n",timestr($td); printf "++Finished program in -> %5.2f seconds\n\n",time-$start;
        poj
Re: csv parsing with multiple missing values/multiple commas
by Corion (Patriarch) on Aug 02, 2014 at 06:27 UTC

    You should enable warnings and actually look at them.

    if($str_check = undef)

    This check is never true and always overwrites $str_check.

    I'm not sure whether your problems are actually related to the code you've shown. The problems seem to be mostly data and logic related. Please help us help you better and show us the loop where you process your data, the warnings and errors you get and the output you expect.

      Hello, I've posted my code.

      for this input

      11504515,1,,0,7,,29891,32,18,2,5,,1,,,05db9164,207b2d81,09ade5f6,f7322fb9,25c83c98,fbad5c96,c3c334be,0b153874,a73ee510,e25c6d5e,c4d946cf,a671c1e0,96c2b31e,b28479f6,d295078c,5ee88d62,e5ba7672,966c77d8,21ddcdc9,b1252a9d,c336836b,,3a171ecb,9fa3e01a,001f3601,59b96d68

      i want this array, blanks/missing values filled with zero 0

      11504515,1,0,0,7,0,29891,32,18,2,5,0,1,0,0,05db9164,207b2d81,09ade5f6,f7322fb9,25c83c98,fbad5c96,c3c334be,0b153874,a73ee510,e25c6d5e,c4d946cf,a671c1e0,96c2b31e,b28479f6,d295078c,5ee88d62,e5ba7672,966c77d8,21ddcdc9,b1252a9d,c336836b,,3a171ecb,9fa3e01a,001f3601,59b96d68

      this is error

      Argument "" isn't numeric in numeric eq (==) at ./test.pl line 91, <GEN2> line 4516.

        So maybe you can also show us line 91?

        Most likely, you have something like:

        my $cell= $row->[ 10 ]; if( $cell == 42 ) { ... }

        You could fix this by conditionally setting $cell to zero:

        my $cell= $row->[ 10 ]; $cell ||= 0; if( $cell == 42 ) { ... }
Re: csv parsing with multiple missing values/multiple commas
by Laurent_R (Canon) on Aug 02, 2014 at 11:36 UTC
    if($str_check = undef){do stuff};{return ('0');}else{return($str_check +);}
    Several errors here, some of which would be detected by the compiler using strictures and warnings. First:
    if($str_check = undef) ...
    should be something like this:
    if (! defined $str_check) ...
    Next, do stuff is incorrect syntax, but I assume you've put there as a summary instead of your real code. The problem though is that it would be good to have the real code. Anyway, when you have:
    if($str_check = undef){do_stuff()};
    The conditional stops at the semi-colon, so that your function will always return '0' and the else branch will never be executed. Maybe you want something like this:
    return $str_check if defined $str_check; return '0';