in reply to Why does perl's internal utf8 seem to allow single-byte latin1?
What I expected from the script was the two byte sequence in all cases.
Your expectations are wrong for ustring1. There's nothing that caused it to be changed to the less efficient storage format.
utf8::is_utf8 pointed this out, and pointed out your expectations were accurate for ustring2 and ustring3.
print_chrcode doesn't look at the internal format. It looks at the content of the string. That's why it didn't tell you anything.
( The previous paragraph is wrong if you happen to use the buggy version of Perl the OP is using. I didn't notice the OP had included the output of this program. With 5.10, you get
)
How can I force the internal perl representation to be two-byte utf-8
utf8::upgrade and utf8::downgrade are used to switch between the two internal formats.
use Devel::Peek qw( Dump ); my $s1 = "Ein Ökonomisches Modell"; my $s2 = "Ein \326konomisches Modell"; Dump($s1); Dump($s2); utf8::upgrade( $s1 ); utf8::upgrade( $s2 ); Dump($s1); Dump($s2);
SV = PV(0x2369cc) at 0x182a354 REFCNT = 1 FLAGS = (PADMY,POK,pPOK) PV = 0x23fcc4 "Ein \326konomisches Modell"\0 CUR = 23 LEN = 24 SV = PV(0x2369dc) at 0x182a384 REFCNT = 1 FLAGS = (PADMY,POK,pPOK) PV = 0x23fd9c "Ein \326konomisches Modell"\0 CUR = 23 LEN = 24 SV = PV(0x2369cc) at 0x182a354 REFCNT = 1 FLAGS = (PADMY,POK,pPOK,UTF8) PV = 0x182430c "Ein \303\226konomisches Modell"\0 [UTF8 "Ein \x{d6}k +onomisches Modell"] CUR = 24 LEN = 25 SV = PV(0x2369dc) at 0x182a384 REFCNT = 1 FLAGS = (PADMY,POK,pPOK,UTF8) PV = 0x1832744 "Ein \303\226konomisches Modell"\0 [UTF8 "Ein \x{d6}k +onomisches Modell"] CUR = 24 LEN = 25
All that being said, I have no idea what you are trying to accomplish. Sounds very very wrong.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Why does perl's internal utf8 seem to allow single-byte latin1?
by brycen (Monk) on Mar 24, 2010 at 04:36 UTC | |
by ikegami (Patriarch) on Mar 24, 2010 at 05:08 UTC | |
by ikegami (Patriarch) on Mar 24, 2010 at 05:15 UTC |