Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Unable to detect error from Text::CSV

by dd-b (Monk)
on Jun 09, 2017 at 22:47 UTC ( [id://1192449] : perlquestion . print w/replies, xml ) Need Help??

dd-b has asked for the wisdom of the Perl Monks concerning the following question:

Running Text::CSV_XS v 1.29 (not quite current) on FreeBSD 10.3, Perl 5.24.1 (all installed from standard packages).

(Not using the header-reading function for this test, which was at least part of what the Text::CSV_XS 1.30 fix was about).

In a loop reading lines of data via $csv->getline_hr(), I get warnings on some lines printed to stderr I think, but I am unable to detect an error after the read using any of the methods I found in the docs ( $csv->status(), $csv->error_input(), and $csv->error_diag() ). Am I misunderstanding the error reporting completely, maybe?

The output of a run, showing the error, is like this:

[ddb@playpen ~/smbshare/Documents/work/tpdbfix/app]$ ./readtpexport.pl + play-thumbs-nohead.txt play-thumbs-nohead.txt New line status: error_input: error_diag: End of line processing New line status: error_input: error_diag: End of line processing Argument "" isn't numeric in subroutine entry at /usr/local/lib/perl5/ +site_perl/mach/5.24/Text/CSV_XS.pm line 867, <$ifh> line 3. New line status: error_input: error_diag: End of line processing New line status: error_input: error_diag: End of line processing [ { "path":"PHOTO_CD\\IMAGES", "vollabel":" PCD0138", "keywords":[], "imagename":"IMG0002.PCD", "vnetname":"\\\\ddb\\r$", "vtype":5 }, { "vtype":4, "path":"Documents\\Photos\\Other People", "vollabel":"home", "keywords":[], "imagename":"Bob Rosen", "vnetname":"\\\\fsfs\\ddb" }, { "keywords":[ "0" ], "imagename":"Breidbart", "vnetname":"\\\\fsfs\\ddb", "path":"Documents\\Photos\\Other People", "vollabel":"home", "vtype":4 }, { "vtype":4, "path":"Documents\\Photos\\Other People", "vollabel":"home", "imagename":"briarpatch@smugmug", "vnetname":"\\\\fsfs\\ddb", "keywords":[] } ] [ddb@playpen ~/smbshare/Documents/work/tpdbfix/app]$

My code:

#! /usr/bin/env perl # Read the export from Thumbs Plus including keywords from filename gi +ven. Make a useful # in-memory data structure and then store that in some useful format ( +JSON?). use warnings; use strict; use utf8; # so literals and identifiers can be in UTF-8 use v5.12; # or later to get "unicode_strings" feature use warnings qw(FATAL utf8); # fatalize encoding glitches use open qw(:std :utf8); # undeclared streams in UTF-8 #use charnames qw(:full :short); # unneeded in v5.16 use Text::CSV; use JSON; use Data::Dumper; # debug # Take keywords field from extract and turn it into an array. # Input is semicolon-separated and terminated. sub keywords { my $res = []; for my $kw ( split (';', $_[0]) ) { push (@$res, $kw) if length($kw); } return $res; } my $csv = Text::CSV->new ( { binary => 1, auto_diag => 1 } ) or die "Cannot use CSV in: ".Text::CSV->error_diag(); print $ARGV[0],"\n"; open my $ifh, "<:encoding(UTF-8)", $ARGV[0] or die "Failed to open $ARGV[0]: $!"; # Types and names, together and in order. my $types = []; my $names = []; my $ix = 0; $types->[$ix] = Text::CSV::PV(); $names->[$ix++] = "vollabel"; # Volum +e.label", $types->[$ix] = Text::CSV::PV(); $names->[$ix++] = "skip001"; # Missin +g in label row $types->[$ix] = Text::CSV::IV(); $names->[$ix++] = "Volume.serialno"; +# Volume.serialno", $types->[$ix] = Text::CSV::IV(); $names->[$ix++] = "vtype"; # Volume.v +type", $types->[$ix] = Text::CSV::PV(); $names->[$ix++] = "vnetname"; # Volum +e.netname", $types->[$ix] = Text::CSV::PV(); $names->[$ix++] = "Volume.filesystem" +; # Volume.filesystem", $types->[$ix] = Text::CSV::PV(); $names->[$ix++] = "path"; # Path.name +", $types->[$ix] = Text::CSV::IV(); $names->[$ix++] = "skip002"; # unknow +n, '1' $types->[$ix] = Text::CSV::IV(); $names->[$ix++] = "skip003"; # unknow +n, '0'" $types->[$ix] = Text::CSV::IV(); $names->[$ix++] = "skip004"; # unknow +n, '0' $types->[$ix] = Text::CSV::PV(); $names->[$ix++] = "date 1"; # date 1" $types->[$ix] = Text::CSV::PV(); $names->[$ix++] = "date 2"; # date 2" $types->[$ix] = Text::CSV::IV(); $names->[$ix++] = "skip005"; # unkown + large int" $types->[$ix] = Text::CSV::IV(); $names->[$ix++] = "skip006"; # unkown + int" $types->[$ix] = Text::CSV::IV(); $names->[$ix++] = "Thumbnail.width"; +# Thumbnail.width", $types->[$ix] = Text::CSV::IV(); $names->[$ix++] = "Thumbnail.height"; + # Thumbnail.height", $types->[$ix] = Text::CSV::IV(); $names->[$ix++] = "skip007"; # unknow +n int" $types->[$ix] = Text::CSV::IV(); $names->[$ix++] = "skip008"; # unknow +n int" $types->[$ix] = Text::CSV::IV(); $names->[$ix++] = "skip009"; # unknow +n int" $types->[$ix] = Text::CSV::IV(); $names->[$ix++] = "skip010"; # unknow +n int" $types->[$ix] = Text::CSV::IV(); $names->[$ix++] = "skip011"; # unknow +n int" $types->[$ix] = Text::CSV::IV(); $names->[$ix++] = "skip012"; # unknow +n int" $types->[$ix] = Text::CSV::IV(); $names->[$ix++] = "skip013"; # unknow +n int" $types->[$ix] = Text::CSV::IV(); $names->[$ix++] = "skip014"; # unknow +n int" $types->[$ix] = Text::CSV::IV(); $names->[$ix++] = "skip015"; # unknow +n int" $types->[$ix] = Text::CSV::PV(); $names->[$ix++] = "imagename"; # Thum +bnail.name", $types->[$ix] = Text::CSV::PV(); $names->[$ix++] = "Thumbnail.metric1" +; # Thumbnail.metric1", $types->[$ix] = Text::CSV::PV(); $names->[$ix++] = "Thumbnail.metric2" +; # Thumbnail.metric2", $types->[$ix] = Text::CSV::PV(); $names->[$ix++] = "skip016"; # unknow +n empty" $types->[$ix] = Text::CSV::IV(); $names->[$ix++] = "skip017"; # unknow +n, '0'" $types->[$ix] = Text::CSV::PV(); $names->[$ix++] = "keywords"; # Keywo +rds.pkeywords", $types->[$ix] = Text::CSV::PV(); $names->[$ix++] = "skip018"; # nothin +g.nothing" $csv->types($types); $csv->column_names($names); my $tokeep = { 'imagename' => 1, 'keywords' => 1, 'path' => 1, 'vnetname' => 1, 'vollabel' => 1, 'vtype' => 1, }; my $tpdb = []; while (my $colref = $csv->getline_hr($ifh)) { print "New line\n"; my $err = $csv->status(); print "status: ", ref($err),"\n"; if ( $err ) { warn "On line $. status is non-zero"; } $err = $csv->error_input(); print "error_input: ", ref($err),"\n"; if (defined $err) { warn "On line $. parse error in $err"; } $err = $csv->error_diag(); print "error_diag: ", ref($err),"\n"; if ($err ne "") { warn "error_diag $err"; } # Remove columns we don't care about for my $cn (keys (%$colref)) { if (! exists $tokeep->{$cn}) { delete $colref->{$cn}; } } # Process keywords $colref->{'keywords'} = keywords ($colref->{'keywords'}); # Do something push (@$tpdb, $colref); print "End of line processing\n"; } # Save the structure built as JSON my $json = JSON->new(); $json->indent(1); print $json->encode ($tpdb); exit 0;

And the test data file is:

PCD0138,,4037894171,5,\\ddb\r$,CDFS,PHOTO_CD\IMAGES,1,0,0,"1996-09-30T +21:38:57","2002-10-12T00:29:25",3368960,2147483648,512,768,0,0,0,24,0 +,68,100,518,336,IMG0002.PCD,m0000000000000000000000000000000000000000 +000000000000000000000000000000000000000000000000000000000000000000000 +000000000000000000000000000000000000000000000000000000000000000000000 +00000000000000,b00000000000000000000000000000000,,0,";",lose home,,3643411828,4,\\fsfs\ddb,NTFS,Documents\Photos\Other People,1,0,2 +60,"2007-11-28T02:36:22.000","2012-10-03T17:25:13.378",0,2147483648,4 +00,400,0,0,3,24,0,172,200,518,4298,Bob Rosen,m1fff407f671f77ff67ff677 +f677f477f037f033f071f073f073f877fcf7fefffd756acacacacacd7560056d72bac +acd72b242bd70081acd755552bd72bacd7d77a562bd756acd7d780802bd756d7d7d78 +080568181a5d7d7564f5681acd6d7d7,b1fff407f671f77ff67ff677f677f477f,,0, +";", home,,3643411828,4,\\fsfs\ddb,NTFS,Documents\Photos\Other People,1,0,2 +60,"2011-03-05T18:39:06.989","2012-10-03T17:25:13.543",0,2147483648,4 +00,400,0,0,3,24,0,172,200,518,3603,Breidbart, Seth,m1fffc07fe61ff7fff +7ff777f677f677f077f077f077f077f077f877fcfffefffd756acacacacacd7560056 +d756d7acd74f242bd756d7acd756552bd756d7acd780802bd756d7acd780804fd781d +7acd7808056818181d7d7562b5681acd7d7d7,b1fffc07fe61ff7fff7ff777f677f67 +7f,,0,";", home,,3643411828,4,\\fsfs\ddb,NTFS,Documents\Photos\Other People,1,0,2 +60,"2007-11-28T02:36:23.000","2012-10-03T17:25:13.679",0,2147483648,4 +00,400,0,0,3,24,0,172,200,518,3603,briarpatch@smugmug,m1fffc07fe61ff7 +fff7ff777f677f677f077f077f077f077f077f877fcfffefffd756acacacacacd7560 +056d756d7acd74f242bd756d7acd756552bd756d7acd780802bd756d7acd780804fd7 +81d7acd7808056818181d7d7562b5681acd7d7d7,b1fffc07fe61ff7fff7ff777f677 +f677f,,0,";",

Replies are listed 'Best First'.
Re: Unable to detect error from Text::CSV
by Tux (Canon) on Jun 10, 2017 at 09:12 UTC

    The error is from perl itself, not from the CSV parser. It happens deep in the bowels of the code:

    sv_setiv (*svp, SvIV (*svp));

    because you specified types, and an empty string is not a number:

    $ perl -wE'my $int = 1 + ""' Argument "" isn't numeric in addition (+) at -e line 1.

    It is (very) easy to alter the code to make IV and NV types forced return of 0 or 0.0 on empty and undef fields, but that would cause backward incompatibilities and I doubt if it warrants a new attribute/options. Besides, you can already guard your own code against this:

    $ perl -wE'no warnings "numeric";my $int = 1 + ""' $

    Or in your own script:

    while (my $colref = eval { no warnings "numeric"; $csv->getline_hr ($f +h)}) {

    Also note that perl does the richt thing anyway:

    say "$_ $names->[$_]\t$colref->{$names->[$_]}" for grep { $types->[$_] == Text::CSV_XS::IV } 0 .. $#$types; -> End of line processing Argument "" isn't numeric in subroutine entry at /pro/3gl/CPAN/Text-CS +V_XS/blib/lib/Text/CSV_XS.pm line 871, <$fh> line 3. New line status: error_input: error_diag: 2 Volume.serialno 3643411828 3 vtype 4 7 skip002 1 8 skip003 0 9 skip004 260 12 skip005 0 13 skip006 2147483648 14 Thumbnail.width 400 15 Thumbnail.height 400 16 skip007 0 17 skip008 0 18 skip009 3 19 skip010 24 20 skip011 0 21 skip012 172 22 skip013 200 23 skip014 518 24 skip015 3603 29 skip017 0

    Enjoy, Have FUN! H.Merijn
Re: Unable to detect error from Text::CSV
by Tux (Canon) on Jun 10, 2017 at 08:16 UTC

    I will address the separate issues in separate posts.

    As poj already saw, your CSV data is not consistent:

    $ csv-check pl-1192449.csv Checked pm-1192449.csv with csv-check 1.8 using Text::CSV_XS 1.30 with perl 5.26.0 and Unicode 9.0.0 OK: rows: 4, columns: (32, 33) sep = <,>, quo = <">, bin = <0>, eol = <"\n"> WARN: multiple column lengths: 3 lines with 32 fields 1 line with 33 fields

    As of Text::CSV_XS version 1.29 and Text::CSV version 1.95, the new strict attribute would have caught that:

    my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1, strict => 1 } +); my $fn = "pm-1192449.csv"; -d "sandbox" and substr $fn, 0, 0, "sandbox/"; say $fn; open my $fh, "<:encoding(utf-8)", $fn or die "$fn: $!\n"; while (my $row = $csv->getline ($fh)) { # }

    ->

    $ perl pm-1192449.pl pm-1192449.csv # CSV_XS ERROR: 2014 - ENF - Inconsistent number of fields @ rec 3 pos + 427 field 33

    Enjoy, Have FUN! H.Merijn
Re: Unable to detect error from Text::CSV
by poj (Abbot) on Jun 10, 2017 at 07:14 UTC

    Line 3 has 33 (not 32) columns due to extra col here.

    Breidbart, Seth,
    
    poj
Re: Unable to detect error from Text::CSV
by Tux (Canon) on Jun 10, 2017 at 08:33 UTC

    I will address the separate issues in separate posts.

    As poj already saw, your CSV data is not consistent:

    $ csv-check pl-1192449.csv Checked pm-1192449.csv with csv-check 1.8 using Text::CSV_XS 1.30 with perl 5.26.0 and Unicode 9.0.0 OK: rows: 4, columns: (32, 33) sep = <,>, quo = <">, bin = <0>, eol = <"\n"> WARN: multiple column lengths: 3 lines with 32 fields 1 line with 33 fields

    csv-check is (also) available on github.

    As of Text::CSV_XS version 1.29 and Text::CSV version 1.95, the new strict attribute would have caught that:

    my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1, strict => 1 } +); my $fn = "pm-1192449.csv"; -d "sandbox" and substr $fn, 0, 0, "sandbox/"; say $fn; open my $fh, "<:encoding(utf-8)", $fn or die "$fn: $!\n"; while (my $row = $csv->getline ($fh)) { # }

    ->

    $ perl pm-1192449.pl pm-1192449.csv # CSV_XS ERROR: 2014 - ENF - Inconsistent number of fields @ rec 3 pos + 427 field 33

    Enjoy, Have FUN! H.Merijn
Re: Unable to detect error from Text::CSV
by dd-b (Monk) on Jun 09, 2017 at 23:08 UTC
    Had the brilliant idea that this might be caused by my setting auto_diag; but I took that out and the problem persists as shown, sadly. So much for "brilliant".