Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: How do I normalize (e.g. strip) diacritical märks from a Unicode string?

by brycen (Monk)
on Apr 20, 2010 at 18:22 UTC ( #835849=note: print w/replies, xml ) Need Help??


in reply to How do I normalize (e.g. strip) diacritical märks from a Unicode string?

afoken: no, this is normalization. ASCIIfying (e.g. encoding) would destroy non-latin text. This method preserves Greek, Hebrew, etc.

Alexander: I am supporting clients in various languages who want the fuzzy matching that stripping diacriticals provides. It might make for the occasional confusion between German bears and bars... but that's much better than missing out on all the potential correct matches. For example in Hebrew vowels are not normally written except for children. Stripping the vowel and pronunciation diacriticals out lets you compare the text as an adult searcher will likely enter it.

Note that I prefer normalization form NFKD, as it translates more ligatures (though not all, for example the ligature Π)

Originally posted as a Categorized Answer.

  • Comment on Re: How do I normalize (e.g. strip) diacritical märks from a Unicode string?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://835849]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (4)
As of 2022-12-03 09:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?