in reply to High-bit ISO Latin character conversion problem.

Let's finish what we started, shall we? Let's begin with what you got, add some more specific entities, and finally build a convertor with it. You got all those elements already via the Chatterbox, but perhaps a few details got lost. The conversion table for the Windows comes from this file: note that it only differes from ISO-Latin-1/Unicode in the range 128-159.
# preparation my %subst = map({ chr($_) => "&#$_;" } 0 .. 255), # a few special ones '<' => '&lt;', '>' => '&gt;', '&' => '&amp;', '"' => '&quot;', # Windows specific map({ chr($_->[0]) => "&#$_->[1];" } [0x80 => 0x20AC], [0x82 => 0x201A], [0x83 => 0x0192], [0x84 => 0x201E], [0x85 => 0x2026], [0x86 => 0x2020], [0x87 => 0x2021], [0x88 => 0x02C6], [0x89 => 0x2030], [0x8A => 0x0160], [0x8B => 0x2039], [0x8C => 0x0152], [0x8E => 0x017D], [0x91 => 0x2018], [0x92 => 0x2019], [0x93 => 0x201C], [0x94 => 0x201D], [0x95 => 0x2022], [0x96 => 0x2013], [0x97 => 0x2014], [0x98 => 0x02DC], [0x99 => 0x2122], [0x9A => 0x0161], [0x9B => 0x203A], [0x9C => 0x0153], [0x9E => 0x017E], [0x9F => 0x0178])); # sample string $_ = "maître d'hôtel"; # for the substitution, for each string, do: s/([&<>'"\177-\377])/$subst{$1}/g; print;
Result:
ma&#238;tre d&#39;h&#244;tel

n.b. Note that this code is developed for perl 5.005, i.e. pre built-in Unicode support in perl.

And of course I tested it with Windows-specific characters, like "€".

Replies are listed 'Best First'.
Re: Re: High-bit ISO Lating character conversion problem.
by shenme (Priest) on Sep 06, 2003 at 20:51 UTC
    That broke the chameau's back - I'm starting a PM snips file! "Subroutines, Snips, Clues and just plain Wow"

    I've seen the technique "initialize entire range of values for hash and then selectively replace special instances as needed" before, but this beautifully emphasizes the subject data, and reads naturally in order of increasing 'specialization'.   Oh, yeah, and the code's useful, too.   (ğ)