and accurately detect their encoding/charset, and reliably convert them to utf-8.
While you can sometimes do a good job, this isn't possible with reliability. This is a rescue/emergency tactic when confronted with broken data. Differing character sets overlap in the bytes that can be used to make them, sometimes a lot. A single byte of garbage can wreck accurate detection on an otherwise obvious/valid guess. The modules you list are the way to go but the two descriptions of this problem you've posted make it feel like an XY problem.
It's only tangentially related but I recommend reading this—🐪🐫🐪🐫🐪: Why does modern Perl avoid UTF-8 by default?—many times. While there is always room for improvement in any endeavor I suspect digging in and seeing how deep the problems actually run may sober your drive to add to the toolset. Go code diving in those modules and add the Unicode::Tussle scripts to the pile if you are getting through the reading too quickly. :P
In reply to Re: How best to avoid mojibake, when attempting to automatically convert documents to utf-8?
by Your Mother
in thread How best to avoid mojibake, when attempting to automatically convert documents to utf-8?
by taint
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |