Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: Text::CSV on Unicode file

by dd-b (Monk)
on Jun 08, 2017 at 04:17 UTC ( [id://1192326] : note . print w/replies, xml ) Need Help??


in reply to Text::CSV on Unicode file

Just for drill, I copied the script and the small test file to a different system to see if the problem reproduces.

It does; original environment is Cygwin, which I think anybody familiar with it is always just a *little* suspicious of, but copying the two test files to a FreeBSD box, the problem reproduces exactly.

[ddb@playpen ~/smbshare/Documents/work/tpdbfix/app]$ cpan -D Text::CSV +_XS Loading internal null logger. Install Log::Log4perl for logging messag +es Reading '/home/ddb/.cpan/Metadata' Database was generated on Wed, 07 Jun 2017 21:41:02 GMT Text::CSV_XS ---------------------------------------------------------------------- +--- (no description) H/HM/HMBRAND/Text-CSV_XS-1.29.tgz /usr/local/lib/perl5/site_perl/mach/5.24/Text/CSV_XS.pm Installed: 1.29 CPAN: 1.29 up to date H.Merijn Brand (HMBRAND) h.m.brand@xs4all.nl [ddb@playpen ~/smbshare/Documents/work/tpdbfix/app]$ cpan -D Text::CSV Loading internal null logger. Install Log::Log4perl for logging messag +es Reading '/home/ddb/.cpan/Metadata' Database was generated on Wed, 07 Jun 2017 21:41:02 GMT Text::CSV ---------------------------------------------------------------------- +--- (no description) I/IS/ISHIGAKI/Text-CSV-1.95.tar.gz /usr/local/lib/perl5/site_perl/Text/CSV.pm Installed: 1.95 CPAN: 1.95 up to date Kenichi Ishigaki (ISHIGAKI) ishigaki@cpan.org [ddb@playpen ~/smbshare/Documents/work/tpdbfix/app]$ ls -l play-thumbs +.txt readtpexport.\ pl -rwxrwxr-x 1 ddb ddb 874 Jun 7 18:41 play-thumbs.txt -rwxrwxr-x 1 ddb ddb 876 Jun 7 21:05 readtpexport.pl [ddb@playpen ~/smbshare/Documents/work/tpdbfix/app]$ file play-thumbs. +txt play-thumbs.txt: UTF-8 Unicode (with BOM) text, with very long lines [ddb@playpen ~/smbshare/Documents/work/tpdbfix/app]$ [ddb@playpen ~/smbshare/Documents/work/tpdbfix/app]$ ./readtpexport.pl + play-thumbs.txt play-thumbs.txt Point a Strings with code points over 0xFF may not be mapped into in-memory fi +le handles readline() on closed filehandle $h at /usr/local/lib/perl5/site_perl/m +ach/5.24/Text/CSV_\ XS.pm line 830. at ./readtpexport.pl line 25. [ddb@playpen ~/smbshare/Documents/work/tpdbfix/app]$

Replies are listed 'Best First'.
Re^2: Text::CSV on Unicode file
by Tux (Canon) on Jun 09, 2017 at 10:07 UTC
    [ddb@playpen ~/smbshare/Documents/work/tpdbfix/app]$ file play-thumbs. +txt play-thumbs.txt: UTF-8 Unicode (with BOM) text, with very long lines

    That (with BOM) was the trigger! I can now reproduce

    I have fixed this for version 1.31. I will need to add some tests for that before I release.

    With the current versions you just don't use :encoding(utf-8) on open, as the headers method then doesn't recognize the BOM as bytes.


    Enjoy, Have FUN! H.Merijn
      Outstanding! Glad I finally was able to describe it precisely enough that you could find it. I assume this means the request for a zipped copy of stuff is now irrelevant? If still useful I could certainly do it. My own further investigation discovered bugs I still have to report on Thumbs Plus -- the database export CSV header and body lines aren't compatible. Sigh. But that's not in any way *your* problem :-) . But I've got a new Text::CSV one, too, that I'll post shortly in a new thread.