You can still keep ranges. There are better ways to represent them; see code below.

You can represent Unicode names against individual codepoints; it will become somewhat difficult and possibly messy for ranges of codepoints. I recommend that you have Unicode PDF Character Code Chart "Thai -- Range: 0E00-0E7F" at hand when developing; this sequentially lists the codepoints, their glyphs, their names, and some entries have additional notes. You might consider adding that link to your module's POD. If you're writing code for other (Unicode) scripts, you can find links to all of the current charts at "Unicode 15.1 Character Code Charts".

Having multiple names for the same subroutine is often confusing and generally, in my opinion, a design flaw; however, it's easily achieved with additional keys in the despatch table. I would urge you to reconsider if that's something you really need.

Update: I've just posted and saw your reply to hippo. Given your explanation, use of multiple names seems valid in this instance.

New script and Module:

ken@titan ~/tmp/pm_11155205_uni_char_class $ ls -l *2* -rw-r--r-- 1 ken None 1275 Oct 29 01:50 PolyUniCharClass2.pm -rwxr-xr-x 1 ken None 370 Oct 29 01:42 uni_char_class_2.pl

uni_char_class_2.pl:

#!/usr/bin/env perl use strict; use warnings; use open OUT => qw{:encoding(UTF-8) :std}; use lib '.'; # DEMO ONLY -- DON'T use in PRODUCTION! use PolyUniCharClass2; for my $prefix (qw{In Is If}) { for my $class (qw{H L M}) { my $cons = "${prefix}Thai${class}Cons"; print "$cons:\n"; print @{PolyUniCharClass2::list($cons)}, "\n"; } }

PolyUniCharClass2.pm:

package PolyUniCharClass2; use strict; use warnings; { my %char_class_despatch = ( InThaiHCons => \&InThaiHCons, InThaiLCons => \&InThaiLCons, IsThaiHCons => \&InThaiHCons, IsThaiLCons => \&InThaiLCons, ); sub list { my ($char_class) = @_; unless (exists $char_class_despatch{$char_class}) { warn "Char class '$char_class' doesn't exist!\n"; return []; } return [map chr, @{$char_class_despatch{$char_class}->()}]; } } { my $ThaiHCons = [qw{0E02-0E03 0E09 0E10 0E16}]; my $ThaiLCons = [qw{0E04-0E07 0E0A-0E0D 0E11}]; my %ThaiCons_expanded; sub InThaiHCons { return $ThaiCons_expanded{InThaiHCons} ||= _expand($ThaiHCons) +; } sub InThaiLCons { return $ThaiCons_expanded{InThaiLCons} ||= _expand($ThaiLCons) +; } } { my $re = qr{^([0-9A-Fa-f]+)-([0-9A-Fa-f]+)$}; sub _expand { my ($code_range_list) = @_; my @full_list; for my $range (@$code_range_list) { if ($range =~ $re) { push @full_list, hex($1) .. hex($2); } else { push @full_list, hex $range; } } return [@full_list]; } } 1;

Output:

$ ./uni_char_class_2.pl
InThaiHCons:
ขฃฉฐถ
InThaiLCons:
คฅฆงชซฌญฑ
InThaiMCons:
Char class 'InThaiMCons' doesn't exist!

IsThaiHCons:
ขฃฉฐถ
IsThaiLCons:
คฅฆงชซฌญฑ
IsThaiMCons:
Char class 'IsThaiMCons' doesn't exist!

IfThaiHCons:
Char class 'IfThaiHCons' doesn't exist!

IfThaiLCons:
Char class 'IfThaiLCons' doesn't exist!

IfThaiMCons:
Char class 'IfThaiMCons' doesn't exist!

There are a number of improvements you could make to the module code depending on the Perl version you're targeting. You didn't indicate your Perl version. The code I've presented should, I believe, work fine with Perl 5.6 (but I have no way to check that).

— Ken


In reply to Re^3: Listing out the characters included in a character class by kcott
in thread Listing out the characters included in a character class by Polyglot

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.