in reply to High-bit ISO Latin character conversion problem.
Result:# preparation my %subst = map({ chr($_) => "&#$_;" } 0 .. 255), # a few special ones '<' => '<', '>' => '>', '&' => '&', '"' => '"', # Windows specific map({ chr($_->[0]) => "&#$_->[1];" } [0x80 => 0x20AC], [0x82 => 0x201A], [0x83 => 0x0192], [0x84 => 0x201E], [0x85 => 0x2026], [0x86 => 0x2020], [0x87 => 0x2021], [0x88 => 0x02C6], [0x89 => 0x2030], [0x8A => 0x0160], [0x8B => 0x2039], [0x8C => 0x0152], [0x8E => 0x017D], [0x91 => 0x2018], [0x92 => 0x2019], [0x93 => 0x201C], [0x94 => 0x201D], [0x95 => 0x2022], [0x96 => 0x2013], [0x97 => 0x2014], [0x98 => 0x02DC], [0x99 => 0x2122], [0x9A => 0x0161], [0x9B => 0x203A], [0x9C => 0x0153], [0x9E => 0x017E], [0x9F => 0x0178])); # sample string $_ = "maître d'hôtel"; # for the substitution, for each string, do: s/([&<>'"\177-\377])/$subst{$1}/g; print;
maître d'hôtel
n.b. Note that this code is developed for perl 5.005, i.e. pre built-in Unicode support in perl.
And of course I tested it with Windows-specific characters, like "€".
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: High-bit ISO Lating character conversion problem.
by shenme (Priest) on Sep 06, 2003 at 20:51 UTC |