Re: problem with chr function

From utf8 docs:

Note that if you have bytes with the eighth bit on in your script (for example embedded Latin-1 in your string literals), use utf8 will be unhappy since the bytes are most probably not well-formed UTF-8

I guess that, since 192 is "malformed", it is not re-encoded to utf8.

update: From "use encode" docs (perl 5.8 only?):

This pragma also affects encoding of the 0x80..0xFF code point range: normally characters in that range are left as eight-bit bytes (unless they are combined with characters with code points 0x100 or larger, in which case all characters need to become UTF-8 encoded), but if the encoding pragma is present, even the 0x80..0xFF range always gets UTF-8 encoded.

Comment on Re: problem with chr function

Replies are listed 'Best First'.
Re: Re: problem with chr function by John M. Dlugosz (Monsignor) on Oct 25, 2002 at 19:01 UTC
No, that is saying that if you used an 8-bit character set in all its glory, then the parser will not like that as UTF8. That is, a byte containing 192 within the script, perhaps in a string literal, would mean whatever the Console's code page thinks in legacy scripts, but in UTF-8 Perl would get upset because it doesn't follow the rules it's assuming for multi-byte characters.	[reply]