comment on

My module is creating "User-Defined Character Properties" for unicode characters in regular expressions (I think they're called character classes, but please correct me if I've misunderstood). I would like to add a function for showing the characters included in each defined group. One such group, in the package file, might look like this:

sub InThaiHCons { #Thai high-class consonants
    return join "\n",
'0E02', #KHO KHAI
'0E03', #KHO KHUAT
'0E09', #CHO CHING
'0E10', #THO THAN
'0E16', #THO THUNG
'0E1C', #PHO PHUNG
'0E1D', #FO FA
'0E28', #SO SALA
'0E29', #SO RUSI
'0E2A', #SO SUA
'0E2B', #HO HIP
}
[download]

...or like this:

sub InThaiLCons { #Thai low-class consonants
    return <<'END';
0E04 0E07
0E0A 0E0D
0E11 0E13
0E17 0E19
0E1E 0E27
0E2C
0E2E
END
}
[download]

How could the calling (main) program be provided a list of each codepoint associated with that particular character class?

For example, I would like something like this...


my @characters = list('InThaiHCons');

#OR PERHAPS

my @characters = MyModule::list('InThaiHCons');  #IF FUNCTION IS NOT E
+XPORTED

print @characters; #[The site would not print the UTF-8 characters her
+e--see below the code box.]

#Codepoints, e.g. '0E01' instead of actual characters would also be ac
+ceptable, as it should be trivial to convert them.
[download]

#ขฃฉฐถผฝศษสห

Is it possible to create a function that would provide such a list as this without having to duplicate the list in the module? Alternatively, is there any function by which such character classes can be spelled out already, e.g. is there a way to query what /\p{IsThai}/ includes?

Note that I have read documentation on the subject and found the following, but do not understand it, nor does it seem to do quite what I'm needing.

https://perldoc.perl.org/Unicode::UCD#prop_invlist%28%29

Blessings,

~Polyglot~

In reply to Listing out the characters included in a character class by Polyglot

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.