Let's finish what we started, shall we? Let's begin with what you got, add some more specific entities, and finally build a convertor with it. You got all those elements already via the Chatterbox, but perhaps a few details got lost. The conversion table for the Windows comes from this file: note that it only differes from ISO-Latin-1/Unicode in the range 128-159.
# preparation
my %subst = map({ chr($_) => "&#$_;" } 0 .. 255),
# a few special ones
'<' => '<', '>' => '>', '&' => '&', '"' => '"',
# Windows specific
map({ chr($_->[0]) => "&#$_->[1];" }
[0x80 => 0x20AC], [0x82 => 0x201A], [0x83 => 0x0192],
[0x84 => 0x201E], [0x85 => 0x2026], [0x86 => 0x2020],
[0x87 => 0x2021], [0x88 => 0x02C6], [0x89 => 0x2030],
[0x8A => 0x0160], [0x8B => 0x2039], [0x8C => 0x0152],
[0x8E => 0x017D], [0x91 => 0x2018], [0x92 => 0x2019],
[0x93 => 0x201C], [0x94 => 0x201D], [0x95 => 0x2022],
[0x96 => 0x2013], [0x97 => 0x2014], [0x98 => 0x02DC],
[0x99 => 0x2122], [0x9A => 0x0161], [0x9B => 0x203A],
[0x9C => 0x0153], [0x9E => 0x017E], [0x9F => 0x0178]));
# sample string
$_ = "maître d'hôtel";
# for the substitution, for each string, do:
s/([&<>'"\177-\377])/$subst{$1}/g;
print;
Result:
maître d'hôtel
n.b. Note that this code is developed for perl 5.005, i.e. pre built-in Unicode support in perl.
And of course I tested it with Windows-specific characters, like "€". |