in reply to CSV_XS ERROR: 2027 - EIQ - Quoted field not terminated @ pos 408

You said:
The difference between the csv2xls.pl program choking or not choking on the final output file depends only upon whether lines of data were filtered out. If I remove the "or next" I get back my original file, and it converts to Excel just fine, ableit loaded with far too much data.

Well, I'm sorry, but you're going to have to prove that, by posting a runnable snippet of code with appropriate data that demonstrates your problem.

I tried writing a little snippet of code with data to replicate what you describe, and I didn't see any problem:

#!/usr/bin/perl use strict; use warnings; use Text::CSV_XS; # this is what reported an error, right? my $target = ( @ARGV ) ? shift : 'egik'; my $csv = Text::CSV_XS->new; my @keep; while ( my $row = $csv->getline( \*DATA )) { $row->[4] =~ /[$target]/ or next; push @keep, $row; } print join( " : ", @$_ )."\n" for ( @keep ); __DATA__ a,b,"c,c",d,e,f,g,h b,c,d,e,"f,f",g,h,i c,d,e,f,g,h,"i,i",j "d,d",e,f,g,h,i,j,k e,f,g,h,i,j,k,"l,l" f,"g,g",h,i,j,k,l,m g,h,i,"j,j",k,l,m,n h,i,j,k,l,"m,m",n,o
I gather your system has both Text::CSV and Text::CSV_XS installed (and former uses the latter when it's available), but since you say CSV_XS reported the error, I used that explicitly. (You can set it back to Text::CSV if you want.)

By default, the test script (let's call it "test_script") outputs four lines from the test data. If I put one or more matchable letters as a command line arg, it will output as many lines as match the given letters (e.g. test_script f will output 1 line, test_script e-l will output all 8 lines). No problems with quoted fields, even though each line has a quoted field somewhere in it.

So see if you can post a similar snippet that proves the problem you are talking about.

Replies are listed 'Best First'.
Re^2: CSV_XS ERROR: 2027 - EIQ - Quoted field not terminated @ pos 408
by Anonymous Monk on May 22, 2010 at 13:04 UTC
    Text::CSV is not what is reporting the error. csv2xls.pl is.
    Here is a code snippet that duplicates the problem.

    By the way, the data originators paste lines and/or paragraphs from Word into a form that puts it into a proprietary data base within a program called DOORS. DOORS exported the db as a ',' separated csv file. The csv has under 9K rows with lots of embedded newlines, and non-ascii chars.

    use Text::CSV; use UTF8BOM; use utf8; use Encode; use strict; my $origfile = 'csv.csv'; my $infile = 'csv_in.csv'; `cp -f $origfile $infile`; my $outfile = 'csv_out.csv'; my $in_fh; #`cp csv.csv $infile`; UTF8BOM->remove_from_file($infile); my @rows; my $csv = Text::CSV->new ( { binary => 1 } ) or die "Cannot use CSV: ".Text::CSV->error_diag (); open my $in_fh, "<:encoding(utf8)", "$infile" or die (__LINE__.": \n") +; while ( my $row = $csv->getline( $in_fh ) ) { $row->[4] =~ m/Hardware/ or next; # Causes csv2xls.pl problems. push (@rows, ($row)); } $csv->eof or $csv->error_diag(); close $in_fh; $csv->eol ("\r\n"); my $csv_out; my $num_rows = @rows; print ("There are $num_rows rows\n"); open $csv_out, ">:encoding(utf8)", "$outfile" or die(__LINE__.": \n"); $csv->print ($csv_out, $_) for @rows; close $csv_out or die (__LINE__.": \n"); unlink("csv_out.xls"); `csv2xls.pl -u csv_out.csv`;
      Text::CSV is not what is reporting the error. csv2xls.pl is.

      The title of this thread is "CSV_XS ERROR:..." -- I take that to mean that the Text::CSV_XS module is what emitted the error, regardless of which perl script invoked that module.

      Here is a code snippet that duplicates the problem.

      But there is no way to "duplicate the problem" if there's no sample data available that actually causes the problem. Where's the data?

      I know you don't want to post your entire actual input file, and no one here wants you to do that. The problem for you is to locate the point in the file where the problem occurs, and show a sample that contains just that part, or something that is equivalent to it and also causes the same error report.

      It's sad that Text::CSV(_XS) doesn't make it easy to locate the source of the problem in your data. The error message you gave as the title of this thread says "@ pos 408". If I understand correctly, this will refer to the 408th comma-delimited field, counting from the beginning of the file. If you know how many fields there should be per "record", a little arithmetic will tell you how many records there are between the start of the file and the problem. (Note that I'm referring to "records", not "lines", given that some records include embedded line-breaks.)

      Something to try will be to remove initial records from the input data until the error goes away. There should be one record in particular such that you'll get the error (with a low "pos" number) when it is present at the beginning of the input, and you won't get the error when you remove it from the data.

      Still, the real puzzle, according to your initial description, is why there's an error when you try to do "next" to skip some records, but no error when you don't use "next". Assuming there's an error in the data, it should occur in both cases, and whether you use "next" or not should have nothing to do with finding the error.

      So either your initial description is wrong (e.g. there's something else different about the two cases, besides the presence/absence of "next"), or there's something really strange about the data, which you haven't shown.

        graff suggested how I can get the offending data. Found it! I got it down to 2 lines.
        "vv1","vv2","vv3","vv4","vv5","vv6","vv7","vv8","vv9","vv10","vv11" 25 +1,"a"," Hardware .","Hardware ."," Hardware ","d",,,,,

        Please see if it gives you the error below, too.
        C:\temp>a01.pl There are 1 rows # CSV_XS ERROR: 2027 - EIQ - Quoted fi +eld not terminated @ pos 44 2027EIQ - Quoted field not terminated44 a +t ./csv2xls line 132, <> line + 1.