in reply to Re: Listing out the characters included in a character class
in thread Listing out the characters included in a character class
Your InThaiHCons() and InThaiLCons() seem overcomplicated.
There are two nuances to this which you may not have grasped: 1) The double-column codepoints in the 'InThaiLCons' indicate ranges, i.e. the '0E04 0E07' line will actually return '0E04 0E05 0E06 0E07'; and 2) I have formatted the 'InThaiHCons' as I have in order to be able to indicate in the markup what the codepoints represent. It's hard to look at a codepoint and just remember which character it is for, and as the code maintainer, this association helps me tremendously, especially for certain characters. However, I am considering removing those comments for the sake of code brevity and tidiness before releasing the module to CPAN, which I fully intend to do soon, having delayed years already in doing so due to my own lack of confidence (this will be a first for me).
That said, in my quest for methods to do what I want done, I discovered that the subroutines can be called in the code in a different context than that of a regular expression, and they will, themselves, return the codepoints I desire. However, they do not preserve the double-columnness demonstrated by the 'InThaiLCons' of my example, simply putting all the codepoints in a straight list--so I have decided not to use those ranges, despite their obvious efficiency, and just list every single codepoint. This solves a couple problems at once, with only the problem of increasing the visible size of the lists (i.e. more code). So, my new 'InThaiLCons' would look like this:
sub InThaiLCons { return join "\n", '0E04', '0E05', '0E06', '0E07', '0E0A', '0E0B', '0E0C', '0E0D', '0E11', '0E12', '0E13', '0E17', '0E18', '0E19', '0E1E', '0E1F', '0E20', '0E21', '0E22', '0E23', '0E24', '0E25', '0E26', '0E27', '0E2C', '0E2E', }
However, after your suggestions, that can be more efficiently represented as:
I have a new problem, in that I want to use two names for each of these subroutines: i.e. 'InThai...' and 'IsThai...'. Essentially, they appear to be synonymous for many current usages, and I wish for either of these forms to be acceptable with this new functionality as well. So, must I repeat the entire subroutine in the code, changing only its name? or is there a way to alias it to another name?sub InThaiLCons { return [qw{ 0E04 0E05 0E06 0E07 0E0A 0E0B 0E0C 0E0D 0E11 0E12 0E13 0E17 0E18 0E19 0E1E 0E1F 0E20 0E21 0E22 0E23 0E24 0E25 0E26 0E27 0E2C 0E2E }] }
Regarding the use of <pre> tags, are they equivalent to the <code> tags? I had put the UTF8 characters in a <code> block, and they got converted to ugly HTML-entities. That's why I moved them to outside of that block.
Incidentally, there will indeed also be an 'InThaiMCons' definition in this module (and more)!
Blessings,
~Polyglot~
|
|---|