A somewhat more limited approach to the problem taken on in Converting and cleaning Word's HTML export to valid HTML: change the %char2entity hash in HTML::Entities to include the proper translations of those non-ISO characters ([\200-\237]) that MSOffice apps always seem to use when you save as HTML.
This snippet must (clearly) be inserted or required in at some point between use HTML::Entities and the first time you call encode_entities.
Note that many browsers will hork up these entities anyway, but at least the enterprising folks viewing your source will know what you meant. :-)
@HTML::Entities::char2entity{ map chr, 128 .. 159 } = qw ( €  ‚ ƒ „ … † ‡ ˆ ‰ Š ‹ Œ  Ž   ‘ ’ “ ” • – — ˜ ™ š › œ  ž Ÿ );
|
|---|