Note it wasn't funny ampersands in the data, but an actual utf-8 character (the upside down e, U+0259 LATIN SMALL LETTER SCHWA). (darn conversions!)#!perl -w use warnings; use strict; { use utf8; my $string = 'ə'; # this is a schwa in UTF-8, darned handy in linguistics print length $string,"\t",$string, "\n"; my $filestring = <DATA>; chomp $filestring; print length $filestring, "\t", $filestring, "\n"; # seems like it should print "1" here... but it prints 2! } { my $string = 'ə'; print length $string,"\t",$string, "\n"; my $filestring = <DATA>; chomp $filestring; print length $filestring, "\t", $filestring, "\n"; } __DATA__ ə ə
Here's the results (as pre):
1 ə 2 ə 2 ə 2 əIt's the second line that really surprises me... shouldn't that be a '1'? The only apparent difference is that it was read off a filehandle. How can I "reset" that data to be utf8?
Here's my version of Perl (I used pre tags so that d/l code would work!):
C:\>perl -v This is perl, v5.6.1 built for MSWin32-x86-multi-thread (with 1 registered patch, see perl -V for more detail) Copyright 1987-2001, Larry Wall Binary build 633 provided by ActiveState Corp. http://www.ActiveState.com Built 21:33:05 Jun 17 2002 Perl may be copied only under the terms of either the Artistic License or the GNU General Public License, which may be found in the Perl 5 source kit. Complete documentation for Perl, including FAQ lists, should be found on this system using `man perl' or `perldoc perl'. If you have access to the Internet, point your browser at http://www.perl.com/, the Perl Home Page.Anybody have any idea what's wrong here or why it gets the length wrong?
In reply to Setting UTF-8 mode on filehandle reads? by jkahn
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |