in reply to Re: Detecting charset in email
in thread Detecting charset in email

What are the chances that email servers use ASCII if it is not defined in the header? I have tried sending emails from several email accounts including Hotmail, Gmail, operamail and a university account. Some of them use ASCII and some of them have Unicode encoding but none of them specify what they are using. Do you have any more ideas?

Replies are listed 'Best First'.
Re^3: Detecting charset in email
by jhourcle (Prior) on Jun 27, 2005 at 16:02 UTC

    The mail servers should pass through anything without significant inspection. (see RFC821 and RFC2821).

    The problem could either be a misconfigured mail client that generated the message, or that they've hidden the encoding in some other location -- for instance, with MIME (http://www.faqs.org/rfcs/rfc2045.html|RFC2045]), there's an additional set of headers (RFC2047) that you may need to inspect, particularly if it's a multipart message, as each part may have a seperate encoding.

    If there's still not an encoding specified, and it's not US-ASCII, then you're dealing with a BROKEN mail client, as they're not conforming to the protocols for generating internet e-mail. You can either inform the manufacturer of the mistake, or you'd have to take a wild guess at what they indended the message to be. (well, it might not be a completely random guess -- you can look to see if they have the headers 'X-Mailer' or 'User-Agent', and try to infer from that, or look through the content for patterns indicating what it might be).

    There are some older encoding that don't advertise what they are in the headers, because they map within 7bit space (BinHex, uuencode, vCard, PGP, etc.)