As per perlunicode,
\p{Hiragana} will match a hiragana character when used in a regexp.
\p{Katakana} will match a katakana character when used in a regexp.
\p{Han} will match a kanji character when used in a regexp.
You can also negate those. See the referenced document.
Note that the text must have been decoded first (by using :encoding() on open, binmode or use open, or utf8::decode() or Encode::decode() or use utf8; for literals).
use strict; use warnings; use open ':std', ':locale'; $_ = <<"__EOI__"; \x{6F22}\x{5B57} \x{3072}\x{3089}\x{304C}\x{306A} \x{30AB}\x{30BF}\x{30AB}\x{30CA} __EOI__ my $hiragana = join ' ', /\p{Hiragana}+/g; my $katakana = join ' ', /\p{Katakana}+/g; my $kanji = join ' ', /\p{Han}+/g; print("hiragana: $hiragana\n"); print("katakana: $katakana\n"); print("kanji: $kanji\n");
hiragana: ひらがな katakana: カタカナ kanji: 漢字
In reply to Re: Japanese: detect hiragana/katakana/fulll width eisuuji
by ikegami
in thread Japanese: detect hiragana/katakana/fulll width eisuuji
by GaijinPunch
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |