Test 1 - char is hard-coded in script:
use utf8;
use Devel::Peek;
$x="ü"; #<-- unicode char here
print Dump($x);
use bytes;
print length($x);
__END__
Output (ok):
SV = PV(0x15d5584) at 0x1a45848
REFCNT = 1
FLAGS = (POK,pPOK,UTF8)
PV = 0x15d91dc "\303\274"\0 [UTF8 "\x{fc}"]
CUR = 2
LEN = 3
2
Test 2 - char code is hard-coded in script:
use utf8;
use Devel::Peek;
$x="\x{00fc}";
print Dump($x);
use bytes;
print length($x);
__END__
Output (not ok):
SV = PV(0x15d5584) at 0x1a45848
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x15d91dc "\374"\0
CUR = 1
LEN = 2
1
Test 3 - char is read from file which contains only one char
(0x00cf) (0x00fc):
use utf8;
use Devel::Peek;
open(IN, "uni.txt");
binmode(IN,":utf8");
$x=<IN>;
chomp($x);
print Dump($x);
use bytes;
print length($x);
__END__
Output (ok):
SV = PV(0x15d5584) at 0x1a4583c
REFCNT = 1
FLAGS = (POK,pPOK,UTF8)
PV = 0x1a78eec "\303\274"\0 [UTF8 "\x{fc}"]
CUR = 2
LEN = 80
2
I could've sworn yesterday that "Test 4" doesn't work. Have to investigate a little more. What's up with "Test 2"?
Test 4 - Reading "directly" from STDIN (command prompt) was aparently wrong.
Thanks,
mrd
update: This is weird:
use utf8;
use Devel::Peek;
#$x="\x{00fc}"; #<-- not ok!!
#$x = "ü"; #<-- char above. ok
#$x="\x{0103}"; #<-- ok
#$x = "ă"; # char above. ok.
print Dump($x);
use bytes;
print length($x);
__END__
I edit my files (text & code) with vim 6.1. Have "encoding=utf-8". |