In my last response, I believe I covered all of the coding issues. I finished with:

"There are a number of improvements you could make to the module code depending on the Perl version you're targeting. ... The code I've presented should, I believe, work fine with Perl 5.6 ..."

Perl does a great job of keeping up with Unicode versions. The latest Unicode version is 15.1; Perl v5.38 (the latest stable version) supports Unicode 15.0 (see "perl5380delta: Unicode 15.0 is supported"). Writing your code for Perl 5.6 may be insufficient to handle the Unicode support you need; look through the deltas to find the minimum Perl version for your needs.

Partly because it was a fun task for me, but also to show you some of the improvements you could get from a later version, here's the code rewritten for Perl v5.38 and Unicode 15.0.

New script and module:

ken@titan ~/tmp/pm_11155205_uni_char_class $ ls -l *3* -rw-r--r-- 1 ken None 993 Oct 29 05:03 PolyUniCharClass3.pm -rwxr-xr-x 1 ken None 344 Oct 29 05:03 uni_char_class_3.pl

uni_char_class_3.pl:

#!/usr/bin/env perl use v5.38; use open OUT => qw{:encoding(UTF-8) :std}; use lib '.'; # DEMO ONLY -- DON'T use in PRODUCTION! use PolyUniCharClass3; for my $prefix (qw{In Is If}) { for my $class (qw{H L M}) { my $cons = "${prefix}Thai${class}Cons"; say "$cons:"; say PolyUniCharClass3::list($cons)->@*; } }

PolyUniCharClass3.pm:

package PolyUniCharClass3; use v5.38; sub list ($char_class) { state $valid_char_class = {map +($_ => 1), qw{ InThaiHCons IsThaiHCons InThaiLCons IsThaiLCons }}; unless (exists $valid_char_class->{$char_class}) { warn "Char class '$char_class' doesn't exist!\n"; return []; } return [map chr, ThaiCons(substr $char_class, 2)->@*]; } sub ThaiCons ($cons) { state $code_ranges = { ThaiHCons => [qw{0E02-0E03 0E09 0E10 0E16}], ThaiLCons => [qw{0E04-0E07 0E0A-0E0D 0E11}], }; state $ThaiCons_expanded; return $ThaiCons_expanded->{$cons} //= _expand($code_ranges->{$con +s}); } sub _expand ($code_range_list) { state $re = qr{^([0-9A-Fa-f]+)-([0-9A-Fa-f]+)$}; my @full_list; for my $range ($code_range_list->@*) { if ($range =~ $re) { push @full_list, hex($1) .. hex($2); } else { push @full_list, hex $range; } } return [@full_list]; }

Output (unchanged):

ken@titan ~/tmp/pm_11155205_uni_char_class
$ ./uni_char_class_3.pl
InThaiHCons:
ขฃฉฐถ
InThaiLCons:
คฅฆงชซฌญฑ
InThaiMCons:
Char class 'InThaiMCons' doesn't exist!

IsThaiHCons:
ขฃฉฐถ
IsThaiLCons:
คฅฆงชซฌญฑ
IsThaiMCons:
Char class 'IsThaiMCons' doesn't exist!

IfThaiHCons:
Char class 'IfThaiHCons' doesn't exist!

IfThaiLCons:
Char class 'IfThaiLCons' doesn't exist!

IfThaiMCons:
Char class 'IfThaiMCons' doesn't exist!

There were a couple of points at the end of your post which I didn't address. Here goes:

"Regarding the use of <pre> tags, are they equivalent to the <code> tags?"

They sort of do the same job but have these differences:

"Incidentally, there will indeed also be an 'InThaiMCons' definition in this module (and more)!"

I picked the names like If* and *M* for my testing. Your test suite (t/*.t scripts) should check that both success and failure are handled appropriately.

— Ken


In reply to Re^3: Listing out the characters included in a character class [v5.38] by kcott
in thread Listing out the characters included in a character class by Polyglot

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.