Thanks for your response.

For me, your string returned:

93 64 6f 75 62 6c 65 94 d1 20 d2 201c 64 6f 75 62 6c 65 201d 2018 73 69 6e 67 6c 65 2019
xd1 and xd2 are outside the range being checked (x80-x9F), are legal unicode (the cp1252/unicode chart gives the same codes) and encode_entities returned (as you found):
Ñ Ò
Note that, for example, ’ returned x2019 so wouldn't be mucked about by the cp1252 replacement.

The comment in the script:

# "replaces HTML entities... # with the corresponding Unicode character"
is from the H::E doc and had more significance than I first realised.

__But__ I too am surprised it appears to work. I wrote quite a bit of code to process utf8 and was almost a bit miffed that it seemed unnecessary!

This my first outing in these waters so will be pleased to be corrected if I've got any of this tangled up.

Again, thanks for your comments,
John

unicode.org cp1252 chart

update:

Extract from the chart:

cp1252 unicode 0xD1 0x00D1 #LATIN CAPITAL LETTER N WITH TILDE 0xD2 0x00D2 #LATIN CAPITAL LETTER O WITH GRAVE

In reply to Re^2: Fixing suspect characters in HTML by wfsp
in thread Fixing suspect characters in HTML by wfsp

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.