I think it works in most cases as long as all the texts you want to test are converted to Perl's internal representation of strings (with utf8 flag on).sub identify_CJK { local $_ = shift; return "J" if /\p{Hiragana}|\p{Katakana}/; return "K" if /\p{Hangul}/; return "C" if /\p{Han}/; return "Others"; # Note that the order matters because Japanese text # most likely contains Hanzi (Kanji) characters and # so does Korean text (less frequently though). }
In reply to Re^2: How to Identify a language
by Anonymous Monk
in thread How to Identify a language
by moshkod
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |