I don't know much about how encodings work, but if what we've finally got, after all those years with extended ASCII charsets, is a universal way of representing characters using multibytes, should not then UTF-8 be _the_ encoding and thus always specified in concerned documents ?
Let me guess... There's still no all-encompassing way of representing all known characters. 32 bits (4294967295) is still not enough.