in reply to Re^3: Listing out the characters included in a character class
in thread Listing out the characters included in a character class
I agree with this; however, others before me have already given us all such synonyms as "InThai" and "IsThai". That being the case, others coming along may not know which form to use. Sigh. To my mind, "InThai" looks to represent a range, and "IsThai" represents a quality--but these do happen to both apply to the same codepoints in this case. The same is true, however, for all of my Thai character groupings--essentially anytime more than one character is involved. But because of this overlap, and because it boils down to mere semantics and what people will remember/opine/prefer, I think it best to create the secondary names across the board, for flexibility/compatibility, even for single-codepoint returns.
The Perl documents are poor in this respect, and do not clarify the distinctions among \p{Thai}, \p{InThai}, \p{IsThai}. An explanation is offered at this URL: https://www.regular-expressions.info/unicode.html, saying:
Not all Unicode regex engines use the same syntax to match Unicode blocks. Java, Ruby 2.0, and XRegExp use the \p{InBlock} syntax as listed above. .NET and XML use \p{IsBlock} instead. Perl and the JGsoft flavor support both notations. I recommend you use the “In” notation if your regex engine supports it. “In” can only be used for Unicode blocks, while “Is” can also be used for Unicode properties and scripts, depending on the regular expression flavor you’re using. By using “In”, it’s obvious you’re matching a block and not a similarly named property or script.
Blessings,
~Polyglot~
|
|---|