You can still keep ranges. There are better ways to represent them; see code below.
You can represent Unicode names against individual codepoints; it will become somewhat difficult and possibly messy for ranges of codepoints. I recommend that you have Unicode PDF Character Code Chart "Thai -- Range: 0E00-0E7F" at hand when developing; this sequentially lists the codepoints, their glyphs, their names, and some entries have additional notes. You might consider adding that link to your module's POD. If you're writing code for other (Unicode) scripts, you can find links to all of the current charts at "Unicode 15.1 Character Code Charts".
Having multiple names for the same subroutine is often confusing and generally, in my opinion, a design flaw; however, it's easily achieved with additional keys in the despatch table. I would urge you to reconsider if that's something you really need.
Update: I've just posted and saw your reply to hippo. Given your explanation, use of multiple names seems valid in this instance.
New script and Module:
ken@titan ~/tmp/pm_11155205_uni_char_class $ ls -l *2* -rw-r--r-- 1 ken None 1275 Oct 29 01:50 PolyUniCharClass2.pm -rwxr-xr-x 1 ken None 370 Oct 29 01:42 uni_char_class_2.pl
uni_char_class_2.pl:
#!/usr/bin/env perl use strict; use warnings; use open OUT => qw{:encoding(UTF-8) :std}; use lib '.'; # DEMO ONLY -- DON'T use in PRODUCTION! use PolyUniCharClass2; for my $prefix (qw{In Is If}) { for my $class (qw{H L M}) { my $cons = "${prefix}Thai${class}Cons"; print "$cons:\n"; print @{PolyUniCharClass2::list($cons)}, "\n"; } }
PolyUniCharClass2.pm:
package PolyUniCharClass2; use strict; use warnings; { my %char_class_despatch = ( InThaiHCons => \&InThaiHCons, InThaiLCons => \&InThaiLCons, IsThaiHCons => \&InThaiHCons, IsThaiLCons => \&InThaiLCons, ); sub list { my ($char_class) = @_; unless (exists $char_class_despatch{$char_class}) { warn "Char class '$char_class' doesn't exist!\n"; return []; } return [map chr, @{$char_class_despatch{$char_class}->()}]; } } { my $ThaiHCons = [qw{0E02-0E03 0E09 0E10 0E16}]; my $ThaiLCons = [qw{0E04-0E07 0E0A-0E0D 0E11}]; my %ThaiCons_expanded; sub InThaiHCons { return $ThaiCons_expanded{InThaiHCons} ||= _expand($ThaiHCons) +; } sub InThaiLCons { return $ThaiCons_expanded{InThaiLCons} ||= _expand($ThaiLCons) +; } } { my $re = qr{^([0-9A-Fa-f]+)-([0-9A-Fa-f]+)$}; sub _expand { my ($code_range_list) = @_; my @full_list; for my $range (@$code_range_list) { if ($range =~ $re) { push @full_list, hex($1) .. hex($2); } else { push @full_list, hex $range; } } return [@full_list]; } } 1;
Output:
$ ./uni_char_class_2.pl InThaiHCons: ขฃฉฐถ InThaiLCons: คฅฆงชซฌญฑ InThaiMCons: Char class 'InThaiMCons' doesn't exist! IsThaiHCons: ขฃฉฐถ IsThaiLCons: คฅฆงชซฌญฑ IsThaiMCons: Char class 'IsThaiMCons' doesn't exist! IfThaiHCons: Char class 'IfThaiHCons' doesn't exist! IfThaiLCons: Char class 'IfThaiLCons' doesn't exist! IfThaiMCons: Char class 'IfThaiMCons' doesn't exist!
There are a number of improvements you could make to the module code depending on the Perl version you're targeting. You didn't indicate your Perl version. The code I've presented should, I believe, work fine with Perl 5.6 (but I have no way to check that).
— Ken
In reply to Re^3: Listing out the characters included in a character class
by kcott
in thread Listing out the characters included in a character class
by Polyglot
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |