I'll make this short & sweet. I'm using Jcode and Kakasi on a rather sizeable list of strings. Kakasi behaves differently when you pass it hiragana or kanji. Is there any mod that will tell you what a certain double-byte character is? For this particular script, everything ends up utf8. I guess the non-lazy way would be to examine both bits against a table. But hey, it's Sunday. What can I say?
I know EUC characters fall in a nice range (hiragana, katakana, and full width eisuuji being at the first, if memory serves). Guess I should dig around and look a bit closer at utf8.