in reply to Re: Japanese: detect hiragana/katakana/fulll width eisuuji
in thread Japanese: detect hiragana/katakana/fulll width eisuuji

Thanks, that helps. Although the full width roman characters are the ones giving me the unusable results. :|
  • Comment on Re^2: Japanese: detect hiragana/katakana/fulll width eisuuji

Replies are listed 'Best First'.
Re^3: Japanese: detect hiragana/katakana/fulll width eisuuji
by ikegami (Patriarch) on Feb 01, 2009 at 06:29 UTC

    http://unicode.org/charts/PDF/UFF00.pdf

    To detect:

    [\x{FF01}-\x{FF60}\x{FFE0}-\x{FFE6}]Full widths ASCII variants, brackets and symbols
    [\x{FF01}-\x{FF5E}]Full widths ASCII variants
    [\x{FF21}-\x{FF3A}]Full widths ASCII uppercase letters
    [\x{FF41}-\x{FF5A}]Full widths ASCII lowercase letters
    [\x{FF10}-\x{FF19}]Full widths ASCII digits

    To convert:

    my %fullwidth_to_narrow = map chr, ( ( map { $_ => $_-0xFF01+0x21 } 0xFF01..0xFF5E ), 0xFF5F => 0x2985, 0xFF60 => 0x2986, 0xFFE0 => 0x00A2, 0xFFE1 => 0x00A3, 0xFFE2 => 0x00AC, 0xFFE3 => 0x00AF, 0xFFE4 => 0x00A6, 0xFFE5 => 0x00A5, 0xFFE6 => 0x20A9, );