in reply to Re^3: The Queensr’che Situation
in thread The Queensr’che Situation

Right. So #1 is utf-8. Then #2 is utf-16?

So then why does this:

use utf8; my $string = "Queensr’che"; no utf8;
Produce this:
81 Q Q 117 u u 101 e e 101 e e 110 n n 115 s s 114 r r 255 {ff} 99 c c 104 h h 101 e e - this is utf8
When this:
#use utf8; my $string = "Queensr’che"; #no utf8;
Produces this:
81 Q Q 117 u u 101 e e 101 e e 110 n n 115 s s 114 r r 195 191 99 c c 104 h h 101 e e - this is NOT utf8
If the two bytes are "there", why is "use utf8" yielding a dec 255 for the "’" which is not valid utf8?

"The first 128 characters (US-ASCII) need one byte. The next 1,920 characters need two bytes to encode." - http://en.wikipedia.org/wiki/UTF-8

Replies are listed 'Best First'.
Re^5: The Queensr’che Situation
by Jim (Curate) on Oct 19, 2014 at 20:02 UTC
    Right. So #1 is utf-8. Then #2 is utf-16?

    No, #2 is ISO-8859-1, which is also known as Latin 1. As it happens, it's also Windows-1252, which today is really a quasi-superset of ISO-8859-1. Neither ISO-8859-1 nor Windows-1252 are Unicode at all, so #2 is not in any Unicode character encoding scheme such as UTF-16.

    The character encodings ISO-8859-1 (Latin 1) and Windows-1252 are often referred to as "legacy encodings," especially vis-ą-vis Unicode.