Ritter has asked for the wisdom of the Perl Monks concerning the following question:

How can I change all those latin characters with dots, rings, tildes other things in a string, to normal, plain english [a-z] characters?

I want "aבְוהֲד" to become "aaAaaAa". I was pointed at the module Encode, but I can't figure out how to use that module to perform this task.

To manually add all these probably several hundred characters would be unsafe procedure so if there is another way I would be glad.

If the information helps some way the strings will always originally be encoded with UTF-8 character set.

Thanks,
Ritter
  • Comment on Converting accented characters e.g. והגבאד to aaaaaa

Replies are listed 'Best First'.
Re: Converting accented characters e.g. והגבאד to aaaaaa
by ikegami (Patriarch) on Nov 10, 2006 at 16:16 UTC

    Perhaps Text::StripAccents? Definitely not Encode (although you may need it too).

    Update: Text::StripAccents seems to require the string to be iso-latin-1 encoded. Text::Unaccent should work no matter what the encoding.

Re: Converting accented characters e.g. והגבאד to aaaaaa
by zentara (Cardinal) on Nov 10, 2006 at 17:08 UTC
      thanks, Unicode::Normalize solved it (leaving the first two proposals untested)