Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re^3: The “real length" of UTF8 strings

by moritz (Cardinal)
on Sep 24, 2008 at 07:57 UTC ( [id://713369]=note: print w/replies, xml ) Need Help??


in reply to Re^2: The “real length" of UTF8 strings
in thread The “real length" of UTF8 strings

Sure, but the Han script is probably about 40000 characters big: no way to write a list by hand.

That's why my example queries each character for the Unicode property \p{Han}, ie if the character is in that script block.

For a better description of Unicode properties and script blocks in Regexes I recommend "Mastering Regular Expressions" by Jeffrey Friedl, pages 121pp.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://713369]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (2)
As of 2024-04-25 01:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found