Character codes 128 to 159 (U+0080 to U+009F) are not allowed in HTML; even if they were, they would likely be unprintable control characters. Tidy assumed you wanted to refer to a character with the same byte value in the specified encoding and replaced that reference with the Unicode equivalent. #### $ perl -nle 'print if /\P{ASCII}/' inputfile | uniquote -vE cp1252 $ perl -nle 'print if /\P{ASCII}/' inputfile | uniquote -vE latin1 $ perl -nle 'print if /\P{ASCII}/' inputfile | uniquote -vE macroman #### $ perl -wle 'binmode(STDOUT, "encoding(cp1252)")||die; print "He said, \x{201C}I\x{2019}m r\x{E9}served.\x{201D}"' > sample $ uniquote -vE latin1 sample He said, \N{SET TRANSMIT STATE}I\N{PRIVATE USE TWO}m r\N{LATIN SMALL LETTER E WITH ACUTE}served.\N{CANCEL CHARACTER} $ uniquote -vE cp1252 sample He said, \N{LEFT DOUBLE QUOTATION MARK}I\N{RIGHT SINGLE QUOTATION MARK}m r\N{LATIN SMALL LETTER E WITH ACUTE}served.\N{RIGHT DOUBLE QUOTATION MARK} $ uniquote -vE macroman sample He said, \N{LATIN SMALL LETTER I WITH GRAVE}I\N{LATIN SMALL LETTER I WITH ACUTE}m r\N{LATIN CAPITAL LETTER E WITH GRAVE}served.\N{LATIN SMALL LETTER I WITH CIRCUMFLEX} #### $ perl -wle 'binmode(STDOUT, "encoding(cp1252)")||die; print "He said, \x{201C}I\x{2019}m r\x{E9}served.\x{201D}"' | uniquote --encoding cp1252 He said, \N{U+201C}I\N{U+2019}m r\N{U+E9}served.\N{U+201D}