use utf8; use Devel::Peek; $x="ü"; #<-- unicode char here print Dump($x); use bytes; print length($x); __END__
Output (ok):
SV = PV(0x15d5584) at 0x1a45848 REFCNT = 1 FLAGS = (POK,pPOK,UTF8) PV = 0x15d91dc "\303\274"\0 [UTF8 "\x{fc}"] CUR = 2 LEN = 3 2
Test 2 - char code is hard-coded in script:
use utf8; use Devel::Peek; $x="\x{00fc}"; print Dump($x); use bytes; print length($x); __END__
Output (not ok):
SV = PV(0x15d5584) at 0x1a45848 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x15d91dc "\374"\0 CUR = 1 LEN = 2 1
Test 3 - char is read from file which contains only one char
(0x00cf) (0x00fc):
use utf8; use Devel::Peek; open(IN, "uni.txt"); binmode(IN,":utf8"); $x=<IN>; chomp($x); print Dump($x); use bytes; print length($x); __END__
Output (ok):
SV = PV(0x15d5584) at 0x1a4583c REFCNT = 1 FLAGS = (POK,pPOK,UTF8) PV = 0x1a78eec "\303\274"\0 [UTF8 "\x{fc}"] CUR = 2 LEN = 80 2
I could've sworn yesterday that "Test 4" doesn't work. Have to investigate a little more. What's up with "Test 2"?
Test 4 - Reading "directly" from STDIN (command prompt) was aparently wrong.
Thanks,
mrd
update: This is weird:
use utf8; use Devel::Peek; #$x="\x{00fc}"; #<-- not ok!! #$x = "ü"; #<-- char above. ok #$x="\x{0103}"; #<-- ok #$x = "ă"; # char above. ok. print Dump($x); use bytes; print length($x); __END__
I edit my files (text & code) with vim 6.1. Have "encoding=utf-8".
In reply to Re: Re: length in bytes of utf8 string
by mrd
in thread length in bytes of utf8 string
by mrd
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |