Locutus has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I've searched CPAN and PerlMonks for any hints on proved and tested modules that can romanize non-latin Unicode, i.e. transform a given Unicode string into its phonetic notation described in latin characters. With Unicode::Transliterate and Text::Unidecode I found two that seem to do what I need - at least according to the examples contained in their documentation.

But I'm surprised by their latest version numbers and release dates: 0.3 from 13 Jul 2001 and 0.04 from 15 Jul 2002, respectively. Some romanization schemes (e.g. for Korean script) are much newer so I wonder: Are these modules still state-of-the-art? Are there perhaps some more reliable ones hidden in the CPAN from my eyes / search terms?

Thanks for your advice!

  • Comment on Transliterating non-latin Unicode - a job for Perl?

Replies are listed 'Best First'.
Re: Transliterating non-latin Unicode - a job for Perl?
by Corion (Patriarch) on Nov 11, 2013 at 12:10 UTC

    Text::Unidecode has some patches pending on RT, but none of them seem to address your concern of romanizing according to other/newer schemes. It seems that Mike Doherty has a Text::Unidecode github repository where he collects and applies some patches, but so far, none of these have made it back to a new CPAN release.

    A good approach to give some love to this module would be to ask Sean Burke about adopting the module or getting co-maintainership for a new release. That would still not add the new romanization schemes, but at least then you could work on them.

Re: Transliterating non-latin Unicode - a job for Perl?
by Jim (Curate) on Nov 11, 2013 at 16:52 UTC

    Take a look at Lingua::Translit. It's newer than the other CPAN modules and, according to its documentation, is extensible. You can apply arbitrarily different transliteration schemes with it.

    Jim