⭐ in reply to How do I normalize (e.g. strip) diacritical märks from a Unicode string?
use strict; use warnings; use utf8; use Unicode::Normalize; my $s = "söme stüff\n"; $s = NFD($s); $s =~ s/\pM//g; print $s;
Depending on the application, the NFKD might or might not be more appropriate than NFD.
The code snippet above removes all marking characters, not just diacritical marks. You can change that by removing only \x{308}. The following code strips the diacritical mark, but leaves the accents:
use strict; use warnings; use utf8; use Unicode::Normalize; binmode STDOUT, ':utf8'; my $s = "söme stüff with áccènts\n"; $s = NFD($s); $s =~ s/\x{308}//g; $s = NFC($s); print $s;
|
|---|