igoryonya has asked for the wisdom of the Perl Monks concerning the following question:

When I parse a csv file, I get the following error:
# CSV_PP ERROR: 2023 - EIQ - QUO character not allowed @ rec 4 pos 105 + field 7
The data:
147;lakjfh lkjsfh ehjd;134-324-730 31;291;24.04.2020;15 000,00;severo- +vostocnoe otdelenie № 8645 pao "sberbank rossii";4243972345;347 +636334;23452347344633423542;severo-vostocnoe otdelenie N8645 pao sber +bank g. magadan;896868986;98375423895239529987;;96764128476876487264 148;lkjf fdkas fa;123-105-878 17;1;23.04.2020;15 000,00;severo-vostocn +oe otdelenie № 8645 pao "sberbank rossii";4243972345;347636334; +23452347344633423542;severo-vostocnoe otdelenie N8645 pao sberbank g. + magadan;896868986;98375423895239529987;;23236726352762456346 149;sdfg gdsgsdhsds shsddf;104-424-501 02;146;20.04.2020;15 000,00;sev +ero-vostocnoe otdelenie pao "sberbank";4243972345;347636334;234523473 +44633423542;severo-vostocnoe otdelenie N8645 pao sberbank g. magadan; +896868986;98375423895239529987;;23236726352762456346 150;dfgsdfgsdg sdgsdgsdgsdg sdfgdsgs;095-504-250 68;68;17.04.2020;15 0 +00,00;"aziatsko-tihookeanskiy bank" (pao) g. blagovesensk;3473446334; +280101001;23452344637833423542;"aziatsko-tihookeanskiy bank" (pao) g. + blagovesensk;098676868;9837542asdfaas529987;;23236726352762456346 151;sdfgds fdsgd ssdgsd;108-437-022 37;258;23.04.2020;15 000,00;severo +-vostocnoe otdelenie № 8645 pao "sberbank rossii";4243972345;34 +7636334;23452347344633423542;severo-vostocnoe otdelenie N8645 pao sbe +rbank g. magadan;896868986;98375423895239529987;;23236726352762456346 152;bgerghe egertge ertgeer;074-073-043 41;128;20.04.2020;15 000,00;se +vero-vostocnoe otdelenie № 8645 pao "sberbank rossii";424397234 +5;347636334;23452347344633423542;severo-vostocnoe otdelenie N8645 pao + sberbank g. magadan;896868986;98375423895239529987;;2323672635276245 +6346 153;lskdjfa sflaskjfd aslkdfjaslf;151-533-432 32;33;22.04.2020;15 000, +00;severo-vostocnoe otdelenie № 8645 pao "sberbank rossii";4243 +972345;347636334;23452347344633423542;severo-vostocnoe otdelenie N864 +5 pao sberbank g. magadan;896868986;98375423895239529987;;23236726352 +762456346 154;lasfnf fsdafasdfas afs;134-092-549 45;21;23.04.2020;5 000,00;sever +o-vostocnoe otdelenie № 8645 pao "sberbank rossii";4243972345;3 +47636334;23452347344633423542;severo-vostocnoe otdelenie N8645 pao sb +erbank g. magadan;896868986;98375423895239529987;;2323672635276245634 +6 155;asdfasf asdf asdfasfd;110-497-874 55;50;24.04.2020;15 000,00;sever +o-vostocnoe otdelenie № 8645 pao "sberbank rossii";4243972345;3 +47636334;23452347344633423542;severo-vostocnoe otdelenie N8645 pao sb +erbank g. magadan;896868986;98375423895239529987;;2323672635276245634 +6 156;sadfasf asdf asdfas;456-978-244 89;117;17.04.2020;15 000,00;severo +-vostocnoe otdelenie № 8645 pao "sberbank rossii";4243972345;34 +7636334;23452347344633423542;severo-vostocnoe otdelenie N8645 pao sbe +rbank g. magadan;896868986;98375423895239529987;;23236726352762456346 157;asdfasfwer asfdasfs sadf;139-220-696 59;26;21.04.2020;15 000,00;se +vero-vostocnoe otdelenie № 8645 pao "sberbank rossii";424397234 +5;347636334;23452347344633423542;severo-vostocnoe otdelenie N8645 pao + sberbank g. magadan;896868986;98375423895239529987;;2323672635276245 +6346 158;lksj ljlkjlkjl lkjljlk;133-087-587 59;262;23.04.2020;15 000,00;sev +ero-vostocnoe otdelenie № 8645 pao "sberbank rossii";4243972345 +;347636334;23452347344633423542;severo-vostocnoe otdelenie N8645 pao +sberbank g. magadan;896868986;98375423895239529987;;23236726352762456 +346 159;fghd g dfghdhdfgh;141-008-388 12;241;22.04.2020;30 000,00;severo-v +ostocnoe otdelenie № 8645 pao "sberbank rossii";4243972345;3476 +36334;23452347344633423542;severo-vostocnoe otdelenie N8645 pao sberb +ank g. magadan;896868986;98375423895239529987;;23236726352762456346 160;lkkljh kljhk kjh;123-650-136 21;20;20.04.2020;15 000,00;severo-vos +tocnoe otdelenie № 8645 pao "sberbank rossii";4243972345;347636 +334;23452347344633423542;severo-vostocnoe otdelenie N8645 pao sberban +k g. magadan;896868986;98375423895239529987;;23236726352762456346 161;sdgfd sgdsgdsg sdfgds;154-978-292 22;93;17.04.2020;15 000,00;sever +o-vostocnoe otdelenie 8645 pao sberbank;4243972345;347636334;23452347 +344633423542;severo-vostocnoe otdelenie N8645 pao sberbank g. magadan +;896868986;98375423895239529987;;23236726352762456346 162;lkdasj alsdfka sflas;112-031-647 83;61;17.04.2020;15 000,00;severo +-vostocnoe otdelenie 8645 pao sberbank;7707083893;347636334;234523473 +44633423542;severo-vostocnoe otdelenie N8645 pao sberbank g. magadan; +896868986;89686898634243972345;;23236726352762456346
Code used:
my $codepage = 'utf8'; use utf8; binmode(STDOUT, ":$codepage"); binmode(STDOIN, ":$codepage"); use Text::CSV; my $fileTable = shift; my $CSV_H = Text::CSV->new({ sep_char=>";", binary=>1, blank_is_undef=>1, empty_is_undef=>1, allow_whitespace=>0, allow_loose_quotes=>1 }); if(open my $TBL_H, "<:encoding($codepage)", "$fileTable"){ while(my $row = $CSV_H->getline($TBL_H)){ } $CSV_H->eof or $CSV_H->error_diag(); close $TBL_H; }
Command to execute:
csv-prog.pl csv-file.csv

I realize, that this csv text is not completeley valid, but there has to be a way to bypass the error and parse the file anyway. Otherwise, I would have to write my own csv library, which I would like to avoid.

P.S.

I added callback after the handler initialisation:
$CSV_H->callbacks(error=>\&onerror);
and added a function:
sub onerror{ print '=' x 100, "\n"; printf "ERROR_INPUT: %s\n", $CSV_H->error_input(); printf "EOF: %s\n", ($CSV_H->eof)?('EOF'):('ERROR'); }
, but it doesn't seem to work. None of the print statements from the sub have put anything out to the screen. There was no output.
Any suggestions?

P.P.S.

I've figure it out.
I think, it was because I needed to configure auto_diag=>1.
Now it works.

Now, I am trying to figure out an other issue:
In error handler callback, I've fixed the string, which was passed by the error_input() method, but I cannot figure out how to pass the fixed string back to the input, so it is returned back to the loop for reparsing or returning an already reparsed string back to the loop from the callback?
How to return the fixed result from the callback to the working loop?

Replies are listed 'Best First'.
Re: problem with csv parsing
by hippo (Archbishop) on May 20, 2020 at 11:24 UTC

    This removes the error.

    #!/usr/bin/env perl use strict; use warnings; use Text::CSV; my $fileTable = shift; my $codepage = 'utf8'; my $CSV_H = Text::CSV->new ( { sep_char => ";", binary => 1, blank_is_undef => 1, empty_is_undef => 1, allow_whitespace => 0, quote_char => undef } ); if (open my $TBL_H, "<:encoding($codepage)", $fileTable) { while (my $row = $CSV_H->getline ($TBL_H)) { } $CSV_H->eof or $CSV_H->error_diag (); close $TBL_H; }

    If your input is, as you say, not completely valid then you will have to consider the possibility that further problems may arise. Pre-processing the input into a valid form might be a good idea.

    PS. Are you sure your input is really utf8?

      Thank you, it worked partially. I.e., it didn't split all the delimiters to fields, but it's ok. I think, I will feed the parser line by line, using csv handler config, as in my example, and, if an error occurs, feed with the config, suggested by you, recreate the csv from parsing, than reparse again with my original config and so on.
      It was saved ad utf8 from one bookkeeping program. I am sure, that it's utf8, because I've had Russian chars there. They wouldn't display correctly, in other codepage.
      I added callback after the handler initialisation:
      $CSV_H->callbacks(error=>\&onerror);
      and added a function:
      sub onerror{ print '=' x 100, "\n"; printf "ERROR_INPUT: %s\n", $CSV_H->error_input(); printf "EOF: %s\n", ($CSV_H->eof)?('EOF'):('ERROR'); }
      , but it doesn't seem to work. None of the print statements from the sub have put anything out to the screen. There was no output.

        That is the hard way to get neat errors. Just add auto_diag => 2

        Another approach might be to use csv-check and get the options right before you go for the real thing:

        $ csv-check -v1 test.ssv
        Checked test.ssv with csv-check 2.05
        using Text::CSV_XS 1.42 with perl 5.30.0 and Unicode 12.1.0
        test.ssv record 1 at line 1/104 - 2034 - EIF - Loose unescaped quote
            |147;lakjfh lkjsfh ehjd;134-324-730 31;291;24.04.2020;15 000,00;severo-vostocnoe otdelenie \x{02116} 8645 pao "sberbank rossii";4243972345;347636334;23452347344633423542;severo-vostocnoe otdelenie N8645 pao sberbank g. magadan;896868986;98375423895239529987;;96764128476876487264\n|
            |                                                                                                             ▲                                                                                                                                                                          |
        # CSV_XS ERROR: 2034 - EIF - Loose unescaped quote @ rec 0 pos 104 field 2
        
        $ csv-check -v1 test.ssv --allow-loose-quotes
        Checked test.ssv with csv-check 2.05
        using Text::CSV_XS 1.42 with perl 5.30.0 and Unicode 12.1.0
        OK: rows: 16, columns: 2
            sep = <,>, quo = <">, bin = <1>, eol = <"\n">
        

        Once you get the options so that you are able to parse your (invalid) data, show the used attributes with -L:

        $ csv-check -v1 test.ssv --allow-loose-quotes -L allow_loose_escapes : 0 allow_loose_quotes : 1 allow_unquoted_escape : 0 allow_whitespace : 0 always_quote : 0 auto_diag : 1 binary : 1 blank_is_undef : 0 callbacks : (undef) decode_utf8 : 1 diag_verbose : 0 empty_is_undef : 0 eol : escape_char : " escape_null : 1 formula : diag keep_meta_info : 1 quote : (undef) quote_binary : 1 quote_char : " quote_empty : 0 quote_space : 1 sep : (undef) sep_char : strict : 0 types : (undef) undef_str : (undef) verbatim : 0

        Or just show the options that changed the defaults (note that csv-check sets some sane attributes that are not default):

        $ csv-check -v1 test.ssv --allow-loose-quotes -X allow_loose_quotes : 1 auto_diag : 1 binary : 1 formula : diag keep_meta_info : 1

        So, your code would now be:

        use Text::CSV_XS; # Text::CSV_XS is much faster that Text::CSV my $fileTable = shift; my $CSV_H = Text::CSV_XS->new ({ sep_char => ";", binary => 1, blank_is_undef => 1, empty_is_undef => 1, allow_whitespace => 0, allow_loose_quotes => 1, # Should work. If not, maybe a bug in Tex +t::CSV auto_diag => 2, # Added });

        Enjoy, Have FUN! H.Merijn

        If your error callback isn't printing anything, that means that it isn't being called which means that there are no errors. Surely that's a good thing?