The problem is that byte 8F is not defined in cp1252, so what you have isn't (valid) cp1252.
Byte cp1252 Want Ok?
---- ------ ---------------------------------- ---
84 U+201E U+201E DOUBLE LOW-9 QUOTATION MARK yes
71 U+0071 U+0071 LATIN SMALL LETTER Q yes
79 U+0079 U+0079 LATIN SMALL LETTER Y yes
84 U+201E U+201E DOUBLE LOW-9 QUOTATION MARK yes
7B U+007B U+007B LEFT CURLY BRACKET yes
7A U+007A U+007A LATIN SMALL LETTER Z yes
84 U+201E U+201E DOUBLE LOW-9 QUOTATION MARK yes
B7 U+00B7 U+00B7 MIDDLE DOT yes
7B U+007B U+007B LEFT CURLY BRACKET yes
84 U+201E U+201E DOUBLE LOW-9 QUOTATION MARK yes
8F ------ U+008F SINGLE SHIFT THREE NO!
79 U+0079 U+0079 LATIN SMALL LETTER Y yes
84 U+201E U+201E DOUBLE LOW-9 QUOTATION MARK yes
A3 U+00A3 U+00A3 POUND SIGN yes
7F U+007F U+007F DELETE yes
84 U+201E U+201E DOUBLE LOW-9 QUOTATION MARK yes
8F ------ U+008F SINGLE SHIFT THREE NO!
7E U+007E U+007E TILDE yes
It doesn't appear to be any other encoding either.
use strict;
use warnings;
use feature qw( say );
use Encode qw( decode encode );
my $have = "\x84\x71\x79\x84\x7B\x7A\x84\xB7"
. "\x7B\x84\x8F\x79\x84\xA3\x7F\x84"
. "\x8F\x7E";
my $want = "\x{201E}\x{0071}\x{0079}\x{201E}"
. "\x{007B}\x{007A}\x{201E}\x{00B7}"
. "\x{007B}\x{201E}\x{008F}\x{0079}"
. "\x{201E}\x{00A3}\x{007F}\x{201E}"
. "\x{008F}\x{007E}";
for (Encode->encodings(':all')) {
my $got;
if (!eval { $got = decode($_, $have); 1 }) {
warn $@;
next;
}
say if $got eq $want;
}
-- empty output except for bad data errors --
What your editor appears to be doing is treating the bytes as cp1252, and treating undefined bytes as the Unicode character with the same codepoint.
use strict;
use warnings;
use feature qw( say );
use Encode qw( decode encode_utf8 );
my $have = "\x84\x71\x79\x84\x7B\x7A\x84\xB7"
. "\x7B\x84\x8F\x79\x84\xA3\x7F\x84"
. "\x8F\x7E";
my $want = "\x{201E}\x{0071}\x{0079}\x{201E}"
. "\x{007B}\x{007A}\x{201E}\x{00B7}"
. "\x{007B}\x{201E}\x{008F}\x{0079}"
. "\x{201E}\x{00A3}\x{007F}\x{201E}"
. "\x{008F}\x{007E}";
my $got = decode('cp1252', $have, sub { encode_utf8(chr($_[0])) });
say "match" if $got eq $want;
match
|