Let's finish what we started, shall we? Let's begin with what you got, add some more specific entities, and finally build a convertor with it. You got all those elements already via the Chatterbox, but perhaps a few details got lost. The conversion table for the Windows comes from this file: note that it only differes from ISO-Latin-1/Unicode in the range 128-159.
# preparation my %subst = map({ chr($_) => "&#$_;" } 0 .. 255), # a few special ones '<' => '&lt;', '>' => '&gt;', '&' => '&amp;', '"' => '&quot;', # Windows specific map({ chr($_->[0]) => "&#$_->[1];" } [0x80 => 0x20AC], [0x82 => 0x201A], [0x83 => 0x0192], [0x84 => 0x201E], [0x85 => 0x2026], [0x86 => 0x2020], [0x87 => 0x2021], [0x88 => 0x02C6], [0x89 => 0x2030], [0x8A => 0x0160], [0x8B => 0x2039], [0x8C => 0x0152], [0x8E => 0x017D], [0x91 => 0x2018], [0x92 => 0x2019], [0x93 => 0x201C], [0x94 => 0x201D], [0x95 => 0x2022], [0x96 => 0x2013], [0x97 => 0x2014], [0x98 => 0x02DC], [0x99 => 0x2122], [0x9A => 0x0161], [0x9B => 0x203A], [0x9C => 0x0153], [0x9E => 0x017E], [0x9F => 0x0178])); # sample string $_ = "maître d'hôtel"; # for the substitution, for each string, do: s/([&<>'"\177-\377])/$subst{$1}/g; print;
Result:
ma&#238;tre d&#39;h&#244;tel

n.b. Note that this code is developed for perl 5.005, i.e. pre built-in Unicode support in perl.

And of course I tested it with Windows-specific characters, like "€".


In reply to Re: High-bit ISO Lating character conversion problem. by bart
in thread High-bit ISO Latin character conversion problem. by true

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.