in reply to Re^4: Why does Encode::Repair only correctly fix one of these two tandem characters?
in thread Why does Encode::Repair only correctly fix one of these two tandem characters?

The most common garbage from Perl code is mixed UTF-8 and latin-1. It happens when you forgot to specify the output encoding.

print "\N{LATIN CAPITAL LETTER E WITH ACUTE}"; print "\N{BLACK SPADE SUIT}";

The first string consists entirely of bytes, so Perl doesn't know you did something wrong. The second string makes no sense, so Perl guesses you meant to encode it using UTF-8. You end up with a mix of code points (effectively latin-1) and UTF-8.

This is fixed using Encoding::FixLatin

  • Comment on Re^5: Why does Encode::Repair only correctly fix one of these two tandem characters?
  • Download Code