This was the best response yet...and the voters seem to agree. Thank you.

Your InThaiHCons() and InThaiLCons() seem overcomplicated.

There are two nuances to this which you may not have grasped: 1) The double-column codepoints in the 'InThaiLCons' indicate ranges, i.e. the '0E04 0E07' line will actually return '0E04 0E05 0E06 0E07'; and 2) I have formatted the 'InThaiHCons' as I have in order to be able to indicate in the markup what the codepoints represent. It's hard to look at a codepoint and just remember which character it is for, and as the code maintainer, this association helps me tremendously, especially for certain characters. However, I am considering removing those comments for the sake of code brevity and tidiness before releasing the module to CPAN, which I fully intend to do soon, having delayed years already in doing so due to my own lack of confidence (this will be a first for me).

That said, in my quest for methods to do what I want done, I discovered that the subroutines can be called in the code in a different context than that of a regular expression, and they will, themselves, return the codepoints I desire. However, they do not preserve the double-columnness demonstrated by the 'InThaiLCons' of my example, simply putting all the codepoints in a straight list--so I have decided not to use those ranges, despite their obvious efficiency, and just list every single codepoint. This solves a couple problems at once, with only the problem of increasing the visible size of the lists (i.e. more code). So, my new 'InThaiLCons' would look like this:

sub InThaiLCons { return join "\n", '0E04', '0E05', '0E06', '0E07', '0E0A', '0E0B', '0E0C', '0E0D', '0E11', '0E12', '0E13', '0E17', '0E18', '0E19', '0E1E', '0E1F', '0E20', '0E21', '0E22', '0E23', '0E24', '0E25', '0E26', '0E27', '0E2C', '0E2E', }

However, after your suggestions, that can be more efficiently represented as:

sub InThaiLCons { return [qw{ 0E04 0E05 0E06 0E07 0E0A 0E0B 0E0C 0E0D 0E11 0E12 0E13 0E17 0E18 0E19 0E1E 0E1F 0E20 0E21 0E22 0E23 0E24 0E25 0E26 0E27 0E2C 0E2E }] }
I have a new problem, in that I want to use two names for each of these subroutines: i.e. 'InThai...' and 'IsThai...'. Essentially, they appear to be synonymous for many current usages, and I wish for either of these forms to be acceptable with this new functionality as well. So, must I repeat the entire subroutine in the code, changing only its name? or is there a way to alias it to another name?

Regarding the use of <pre> tags, are they equivalent to the <code> tags? I had put the UTF8 characters in a <code> block, and they got converted to ugly HTML-entities. That's why I moved them to outside of that block.

Incidentally, there will indeed also be an 'InThaiMCons' definition in this module (and more)!

Blessings,

~Polyglot~


In reply to Re^2: Listing out the characters included in a character class by Polyglot
in thread Listing out the characters included in a character class by Polyglot

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.