Here you are moving away from strictly orthographic matters into phonetics or phonology, which are essentially context-dependent, and this takes you out of the domain of merely classifying letter symbols into related groups, which is essentially not context-dependent.
If the goal is to provide a means for doing correct word segmentation of Thai text, the handling of the context-dependent rules (like "rr" becomes "un") should probably be in a separate module. The functions that work on sequences of characters will depend on the functions that define the basic character classes.
(You probably could put the subroutines for character-classes and context-dependent rules together in one module if you want to, but the two sets of subroutines will have very different usages from the caller's point of view. And the overall problem being addressed is probably complicated enough that you will want to segregate portions of the solution into separate modules anyway.)
Just curious: have you looked at Lingua::TH::Segmentation? I just happened to notice it was there, but I haven't tried it. Have you?
In reply to Re^3: Creating new character classes for foreign languages
by graff
in thread Creating new character classes for foreign languages
by Polyglot
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |