In my last response, I believe I covered all of the coding issues. I finished with:
"There are a number of improvements you could make to the module code depending on the Perl version you're targeting. ... The code I've presented should, I believe, work fine with Perl 5.6 ..."
Perl does a great job of keeping up with Unicode versions. The latest Unicode version is 15.1; Perl v5.38 (the latest stable version) supports Unicode 15.0 (see "perl5380delta: Unicode 15.0 is supported"). Writing your code for Perl 5.6 may be insufficient to handle the Unicode support you need; look through the deltas to find the minimum Perl version for your needs.
Partly because it was a fun task for me, but also to show you some of the improvements you could get from a later version, here's the code rewritten for Perl v5.38 and Unicode 15.0.
New script and module:
ken@titan ~/tmp/pm_11155205_uni_char_class $ ls -l *3* -rw-r--r-- 1 ken None 993 Oct 29 05:03 PolyUniCharClass3.pm -rwxr-xr-x 1 ken None 344 Oct 29 05:03 uni_char_class_3.pl
uni_char_class_3.pl:
#!/usr/bin/env perl use v5.38; use open OUT => qw{:encoding(UTF-8) :std}; use lib '.'; # DEMO ONLY -- DON'T use in PRODUCTION! use PolyUniCharClass3; for my $prefix (qw{In Is If}) { for my $class (qw{H L M}) { my $cons = "${prefix}Thai${class}Cons"; say "$cons:"; say PolyUniCharClass3::list($cons)->@*; } }
PolyUniCharClass3.pm:
package PolyUniCharClass3; use v5.38; sub list ($char_class) { state $valid_char_class = {map +($_ => 1), qw{ InThaiHCons IsThaiHCons InThaiLCons IsThaiLCons }}; unless (exists $valid_char_class->{$char_class}) { warn "Char class '$char_class' doesn't exist!\n"; return []; } return [map chr, ThaiCons(substr $char_class, 2)->@*]; } sub ThaiCons ($cons) { state $code_ranges = { ThaiHCons => [qw{0E02-0E03 0E09 0E10 0E16}], ThaiLCons => [qw{0E04-0E07 0E0A-0E0D 0E11}], }; state $ThaiCons_expanded; return $ThaiCons_expanded->{$cons} //= _expand($code_ranges->{$con +s}); } sub _expand ($code_range_list) { state $re = qr{^([0-9A-Fa-f]+)-([0-9A-Fa-f]+)$}; my @full_list; for my $range ($code_range_list->@*) { if ($range =~ $re) { push @full_list, hex($1) .. hex($2); } else { push @full_list, hex $range; } } return [@full_list]; }
Output (unchanged):
ken@titan ~/tmp/pm_11155205_uni_char_class $ ./uni_char_class_3.pl InThaiHCons: ขฃฉฐถ InThaiLCons: คฅฆงชซฌญฑ InThaiMCons: Char class 'InThaiMCons' doesn't exist! IsThaiHCons: ขฃฉฐถ IsThaiLCons: คฅฆงชซฌญฑ IsThaiMCons: Char class 'IsThaiMCons' doesn't exist! IfThaiHCons: Char class 'IfThaiHCons' doesn't exist! IfThaiLCons: Char class 'IfThaiLCons' doesn't exist! IfThaiMCons: Char class 'IfThaiMCons' doesn't exist!
There were a couple of points at the end of your post which I didn't address. Here goes:
"Regarding the use of <pre> tags, are they equivalent to the <code> tags?"
They sort of do the same job but have these differences:
"Incidentally, there will indeed also be an 'InThaiMCons' definition in this module (and more)!"
I picked the names like If* and *M* for my testing. Your test suite (t/*.t scripts) should check that both success and failure are handled appropriately.
— Ken
In reply to Re^3: Listing out the characters included in a character class [v5.38]
by kcott
in thread Listing out the characters included in a character class
by Polyglot
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |