Re: Unaccenting characters

I'm not a big fan of such big tables, so instead I'd propose this:

use 5.010;
use strict;
use warnings;
use utf8;
use Unicode::Normalize qw/NFKD/;

sub unaccent {
    my $s = NFKD shift;
    $s =~ s/\pM//g;
    return $s;
}

say unaccent "Les Misérables";
__END__
Output:
Les Miserables
[download]

The NFD normalization form has the base character and the accent split into two different characters, and the substitution removes all the marks (\pM).

(And Unicode::Normalize is a core module since perl 5.8, and you really, really don't want to use anything older than that for Unicode stuff).

Perl 6 - the future is here, just unevenly distributed

Comment on Re: Unaccenting characters Select or Download Code

Replies are listed 'Best First'.
Re^2: Unaccenting characters by mwhiting (Beadle) on Aug 29, 2013 at 16:37 UTC
Thanks, I will try that. What is the 'shift' supposed to do in the code. I know what it does in general, but it was in the original code, and now here, and I don't quite see how it fits in.	[reply]
Re^3: Unaccenting characters by moritz (Cardinal) on Aug 29, 2013 at 17:36 UTC
shift without an argument obtains the first element of the argument list of the subroutine, so it fetches the string that is passed to the subroutine. Perl 6 - the future is here, just unevenly distributed	[reply]