When I try to read the header line of a CSV file that I opened with Unicode encoding (and which actually has some non-ASCII in it, though not I think in the header line) I get the error:

Strings with code points over 0xFF may not be mapped into in-memory fi +le handles readline() on closed filehandle $h at /usr/lib/perl5/site_perl/5.22/i6 +86-cygwin-threads-64int/Text/CSV_XS.pm line 830.

*I'm* not doing file IO on any strings, and the code line given is in Text::CSV not my code.

There's a "Unicode" section to the doc for Text::CSV and I think I did what it said. I verified that turning *off* unicode for that file eliminates this error message. (Since there are actual non-ASCII characters in the file that must be read and comprehended later that's not a long-term solution.)

Any ideas? The symptoms look like Unicode just doesn't work, but the Unicode section in the docs seems pretty clearly to be based on the assumption that it does, and it must be pretty commonly used.

Not much to my code so far, just the start of this bit. It's the $csv->header($ifh) call throws this error.

#! /usr/bin/env perl # Read the export from Thumbs Plus including keywords from filename gi +ven. use warnings; use strict; use utf8; # so literals and identifiers can be in UTF-8 use v5.12; # or later to get "unicode_strings" feature use warnings qw(FATAL utf8); # fatalize encoding glitches #use open qw(:std :utf8); # undeclared streams in UTF-8 #use charnames qw(:full :short); # unneeded in v5.16 use Text::CSV; use Data::Dumper; # debug my $csv = Text::CSV->new ( { binary => 1 } ) or die "Cannot use CSV in: ".Text::CSV->error_diag(); print $ARGV[0],"\n"; open my $ifh, "<:encoding(UTF-8)", $ARGV[0] or die "$ARGV[0]: $!"; print "Point a\n"; # Returns "the instance" -- of what? Do I care? my $thingie = $csv->header ($ifh); print "Point b\n"; print Dumper($csv), "\n";

The first three lines (long lines) of the input file are:

$ head -3 /cygdrive/p/Photos/ThumbsPlus/Thumbs.txt "Volume.label","Volume.serialno","Volume.vtype","Volume.netname","Volu +me.filesystem","Path.name",,"Thumbnail.checksum","Thumbnail.width","T +humbnail.height","Thumbnail.horiz_res","Thumbnail.vert_res","Thumbnai +l.colortype","Thumbnail.colordepth","Thumbnail.gamma","Thumbnail.thum +bnail_width","Thumbnail.thumbnail_height","Thumbnail.thumbnail_type", +"Thumbnail.thumbnail_size","Thumbnail.name","Thumbnail.metric1","Keyw +ords.pkeywords", PCD0138,,4037894171,5,\\ddb\r$,CDFS,PHOTO_CD\IMAGES,1,0,0,"1996-09-30T +21:38:57","2002-10-12T00:29:25",3368960,2147483648,512,768,0,0,0,24,0 +,68,100,518,336,IMG0002.PCD,m0000000000000000000000000000000000000000 +000000000000000000000000000000000000000000000000000000000000000000000 +000000000000000000000000000000000000000000000000000000000000000000000 +00000000000000,b00000000000000000000000000000000,,0,";", PCD0138,,4037894171,5,\\ddb\r$,CDFS,PHOTO_CD\IMAGES,1,0,0,"1996-09-30T +21:38:57","2002-10-12T00:29:25",3354624,2147483648,512,768,0,0,0,24,0 +,68,100,518,336,IMG0003.PCD,m0000000000000000000000000000000000000000 +000000000000000000000000000000000000000000000000000000000000000000000 +000000000000000000000000000000000000000000000000000000000000000000000 +00000000000000,b00000000000000000000000000000000,,0,";",

In reply to Text::CSV on Unicode file by dd-b

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.