- or download this
Character codes 128 to 159 (U+0080 to U+009F) are not allowed in HTML;
even if they were, they would likely be unprintable control characters
+.
Tidy assumed you wanted to refer to a character with the same byte val
+ue in the
specified encoding and replaced that reference with the Unicode equiva
+lent.
- or download this
$ perl -nle 'print if /\P{ASCII}/' inputfile | uniquote -vE cp1252
$ perl -nle 'print if /\P{ASCII}/' inputfile | uniquote -vE latin1
$ perl -nle 'print if /\P{ASCII}/' inputfile | uniquote -vE macroman
- or download this
$ perl -wle 'binmode(STDOUT, "encoding(cp1252)")||die; print "He said
+, \x{201C}I\x{2019}m r\x{E9}served.\x{201D}"' > sample
...
$ uniquote -vE macroman sample
He said, \N{LATIN SMALL LETTER I WITH GRAVE}I\N{LATIN SMALL LETTER I W
+ITH ACUTE}m r\N{LATIN CAPITAL LETTER E WITH GRAVE}served.\N{LATIN SMA
+LL LETTER I WITH CIRCUMFLEX}
- or download this
$ perl -wle 'binmode(STDOUT, "encoding(cp1252)")||die; print "He said,
+ \x{201C}I\x{2019}m r\x{E9}served.\x{201D}"' | uniquote --encoding cp
+1252
He said, \N{U+201C}I\N{U+2019}m r\N{U+E9}served.\N{U+201D}