in reply to Can't tell if UTF-8... or just binary...

I used to have a text editor that determined whether a file was text or binary based on whether it contains null bytes ("\0"). It works extremely well in practice, since virtually all binary strings contain null bytes.

It'll work as well with Unicode text, at least, if it's UTF-8. 16 bit (and 32 bit) Unicode text contains a lot of null bytes, typically every other byte for 16 bit, and 3 out of every 4 bytes for 32 bits.