in reply to Re^6: One bird, two Unicode names
in thread One bird, two Unicode names

The closest to a generic solution is Text::Unidecode's unidecode.

An alternative tact would be to measure how different two strings are, and considering the two the same if the difference is sufficiently small. One measure of difference is the Hamming Distance.

Replies are listed 'Best First'.
Re^8: One bird, two Unicode names
by RCH (Sexton) on Mar 14, 2011 at 08:10 UTC
    Better and better!
    I'd been wondering how to analyse the differences between names e.g. Ammomanes cinctura is either the "Bar-tailed Desert lark" or the "Bar-tailed Lark"

    With your kind hint, I found my way to Text::Brew
    Which not only tells me that the distance (Bar-tailed Desert lark, Bar-tailed Lark) is 8
    But also tells me that the path is to DEL < Desert> and to SUBST<l,L>
    Many thanks

    RichardH