I have a file encoded in UTF-8, including multiple foreign languages (Chinese, Japanese, French, etc.). Each paragraph is of the same language though. Is there an easy way to tell whether a paragraph is about Chinese or French, automatically?
For CJK, the unicode characters seem to be mixed together in the char table. It doesn't seem there is a clear block boundary among them.
It is a different problem other than encoding of foreign languages since all foreign languages are encoded in UTF-8 already.
Somehow, I just want a function giving the language when given a piece of unicode text encoded in utf-8.
Any diea?