in reply to Unaccenting characters
I'm not a big fan of such big tables, so instead I'd propose this:
use 5.010; use strict; use warnings; use utf8; use Unicode::Normalize qw/NFKD/; sub unaccent { my $s = NFKD shift; $s =~ s/\pM//g; return $s; } say unaccent "Les Misérables"; __END__ Output: Les Miserables
The NFD normalization form has the base character and the accent split into two different characters, and the substitution removes all the marks (\pM).
(And Unicode::Normalize is a core module since perl 5.8, and you really, really don't want to use anything older than that for Unicode stuff).
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Unaccenting characters
by mwhiting (Beadle) on Aug 29, 2013 at 16:37 UTC | |
by moritz (Cardinal) on Aug 29, 2013 at 17:36 UTC |