in reply to Jargon relating to Perl strings
now $x consists of a single byte. Even though it requires 16 bits of encoding.$x = "\xEC"; utf8::upgrade($x);
Perhaps the confusion comes from saying that for your definition of a byte, the UTF8 flag doesn't matter, yet it refers to a string element, which is defined in terms of substr, for which the UTF8 flag *does* matter.
I'd say that in my example, $x ends up having 2 bytes, but one character. This is also the difference wc makes.
Of course, you are free to use whatever definition you want -- just do mind that not all people share your definition. Some people prefer not use the term byte at all, just character and octet.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: Jargon relating to Perl strings
by ikegami (Patriarch) on Jan 17, 2012 at 22:23 UTC | |
by JavaFan (Canon) on Jan 18, 2012 at 09:23 UTC | |
by ikegami (Patriarch) on Jan 19, 2012 at 01:36 UTC |