in reply to Re^4: Mail::Sender character set/encoding
in thread Mail::Sender character set/encoding

Ideally, there is meta info, either explicitly (as in Content-Type, etc.), or implicitly (such as where the data originated from, as you're saying).  For Unicode encodings, there's also the BOM.

In case there isn't, we humans (as opposed to computers) typically excel at figuring out what encoding is being used (at least if we understand the language the text is in) — mainly because our mind has abilities and resources (like world knowledge) that AI is still struggling with...  As we can usually tell that only certain characters make sense in certain positions, we can check their byte values against what's documented, and arrive at a decision (or at least an educated guess) rather soon.

  • Comment on Re^5: Mail::Sender character set/encoding