and accurately detect their encoding/charset, and reliably convert them to utf-8.

While you can sometimes do a good job, this isn't possible with reliability. This is a rescue/emergency tactic when confronted with broken data. Differing character sets overlap in the bytes that can be used to make them, sometimes a lot. A single byte of garbage can wreck accurate detection on an otherwise obvious/valid guess. The modules you list are the way to go but the two descriptions of this problem you've posted make it feel like an XY problem.

It's only tangentially related but I recommend reading this—🐪🐫🐪🐫🐪: Why does modern Perl avoid UTF-8 by default?—many times. While there is always room for improvement in any endeavor I suspect digging in and seeing how deep the problems actually run may sober your drive to add to the toolset. Go code diving in those modules and add the Unicode::Tussle scripts to the pile if you are getting through the reading too quickly. :P


In reply to Re: How best to avoid mojibake, when attempting to automatically convert documents to utf-8? by Your Mother
in thread How best to avoid mojibake, when attempting to automatically convert documents to utf-8? by taint

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.