jkahn has asked for the wisdom of the Perl Monks concerning the following question:
Note it wasn't funny ampersands in the data, but an actual utf-8 character (the upside down e, U+0259 LATIN SMALL LETTER SCHWA). (darn conversions!)#!perl -w use warnings; use strict; { use utf8; my $string = 'ə'; # this is a schwa in UTF-8, darned handy in linguistics print length $string,"\t",$string, "\n"; my $filestring = <DATA>; chomp $filestring; print length $filestring, "\t", $filestring, "\n"; # seems like it should print "1" here... but it prints 2! } { my $string = 'ə'; print length $string,"\t",$string, "\n"; my $filestring = <DATA>; chomp $filestring; print length $filestring, "\t", $filestring, "\n"; } __DATA__ ə ə
Here's the results (as pre):
1 ə 2 ə 2 ə 2 əIt's the second line that really surprises me... shouldn't that be a '1'? The only apparent difference is that it was read off a filehandle. How can I "reset" that data to be utf8?
Here's my version of Perl (I used pre tags so that d/l code would work!):
C:\>perl -v This is perl, v5.6.1 built for MSWin32-x86-multi-thread (with 1 registered patch, see perl -V for more detail) Copyright 1987-2001, Larry Wall Binary build 633 provided by ActiveState Corp. http://www.ActiveState.com Built 21:33:05 Jun 17 2002 Perl may be copied only under the terms of either the Artistic License or the GNU General Public License, which may be found in the Perl 5 source kit. Complete documentation for Perl, including FAQ lists, should be found on this system using `man perl' or `perldoc perl'. If you have access to the Internet, point your browser at http://www.perl.com/, the Perl Home Page.Anybody have any idea what's wrong here or why it gets the length wrong?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Setting UTF-8 mode on filehandle reads?
by grantm (Parson) on Dec 06, 2002 at 01:14 UTC | |
by ph0enix (Friar) on Dec 06, 2002 at 12:32 UTC | |
by grantm (Parson) on Dec 06, 2002 at 18:14 UTC | |
|
Re: Setting UTF-8 mode on filehandle reads?
by diotalevi (Canon) on Dec 06, 2002 at 01:03 UTC | |
|
Re: Setting UTF-8 mode on filehandle reads?
by pg (Canon) on Dec 06, 2002 at 15:42 UTC | |
by grantm (Parson) on Dec 07, 2002 at 01:47 UTC | |
by Anonymous Monk on Dec 20, 2012 at 19:30 UTC |