in reply to Re^2: Perl detect utf8, iso-8859-1 encoding
in thread Perl detect utf8, iso-8859-1 encoding
I'm guessing the bug is in Text::Unaccent, but it's directly using the iconv C library, so I can't easily say for sure.
However, maybe this can work:
use strict; use feature qw(unicode_strings say); use Unicode::Normalize 'NFD'; my $author = "Sch\x{f6}\x{f6}ttl"; $author = NFD $author; $author =~ s/\p{Combining_Diacritical_Marks}//g; say $author;
This doesn't include and decode() or encode() of the incoming/outgoing strings. Also, I think that this can also break in cases where there are multiple combining characters.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^4: Perl detect utf8, iso-8859-1 encoding
by swiftlet (Acolyte) on Jul 24, 2020 at 15:07 UTC | |
by jeffenstein (Hermit) on Jul 24, 2020 at 20:06 UTC |