in reply to Re^2: The “real length" of UTF8 strings
in thread The “real length" of UTF8 strings
Sure, but the Han script is probably about 40000 characters big: no way to write a list by hand.
That's why my example queries each character for the Unicode property \p{Han}, ie if the character is in that script block.
For a better description of Unicode properties and script blocks in Regexes I recommend "Mastering Regular Expressions" by Jeffrey Friedl, pages 121pp.
|
|---|