As per perlunicode,
\p{Hiragana} will match a hiragana character when used in a regexp.
\p{Katakana} will match a katakana character when used in a regexp.
\p{Han} will match a kanji character when used in a regexp.
You can also negate those. See the referenced document.
Note that the text must have been decoded first (by using :encoding() on open, binmode or use open, or utf8::decode() or Encode::decode() or use utf8; for literals).
use strict;
use warnings;
use open ':std', ':locale';
$_ = <<"__EOI__";
\x{6F22}\x{5B57}
\x{3072}\x{3089}\x{304C}\x{306A}
\x{30AB}\x{30BF}\x{30AB}\x{30CA}
__EOI__
my $hiragana = join ' ', /\p{Hiragana}+/g;
my $katakana = join ' ', /\p{Katakana}+/g;
my $kanji = join ' ', /\p{Han}+/g;
print("hiragana: $hiragana\n");
print("katakana: $katakana\n");
print("kanji: $kanji\n");
hiragana: ひらがな
katakana: カタカナ
kanji: 漢字
|