GaijinPunch has asked for the wisdom of the Perl Monks concerning the following question:

Hey Monks:

Is there a module that happens to have an easily accessible Unicode Kanji Table in it? Perhaps stored in a hash? I've got this link here
http://www.rikai.com/library/kanjitables/kanji_codes.unicode.shtml

I need to basically loop through, and do something with each unicode string. The problem is they're in hex, so traditional loops are no good. EDIT: I see you can loop using Hex numbers. That looks like the easiest way right now.

Replies are listed 'Best First'.
Re: Unicode Kanji Table?
by halley (Prior) on Jun 28, 2005 at 04:11 UTC
    You might look into Lingua::JP::Kanjidic. It provides access to the KANJIDIC data file, which contains Unicode (utf8) strings to describe about 2200 kanji. I used this module (along with Image::Magick) in preparation of a set of small flashcards.

    --
    [ e d @ h a l l e y . c c ]

Re: Unicode Kanji Table?
by graff (Chancellor) on Jun 28, 2005 at 04:37 UTC
    Do you mean that you want to "do something" with each utf8-encoded character in the range from U4E00 to U9FAF?

    Yes, you can use numbers expressed in hex to control your loop:

    for ( 0x4e00 .. 0x9faf ) { my $char = chr($_); # do something with $char... }
    (update: cleaned up first paragraph)
      Yes, I went with something along those lines, graff. I didn't know Perl could loop through HEX numbers so easily. That's what I get for thinking

      There's a lot of online kanji databases, but they URL always seems to be based on each character's unicode value... so I needed these to cycle through each URL and suck the data into my own databse (making flashcards... like what halley was up to I'm sure). :)

      Thanks guys.
      GP
        I didn't know Perl could loop through HEX numbers so easily.

        Numbers are numbers. You can express them as hex or octal or decimal; the interpreter just remembers the native format for that number. The interpreter usually doesn't even remember what format you used.

        U:\> perl -MO=Deparse -e "print 0x1234" print 4660; -e syntax OK

        --
        [ e d @ h a l l e y . c c ]