in reply to Foreign language characters...

Here's a shortened version of a method I have used in the past:

my @names = qw/Árni Óli Þorgeir Ýr Ægir Þór /; my @names = fix_chars(@names); print "$_\n", for @names; sub fix_chars { for (@_) { tr/ÁÐÉÍÓÚÝÖáðéíóúýö/ADEIOUYOadeiouyo/; s/Þ/Th/; s/Æ/Ae/; s/þ/th/; s/æ/ae/; s/\W/_/g; # Throw away any remaining non-word chars push @ok,$_; } return @ok; }

This is intended for translating Icelandic into international.

It uses transliteration to substitute single characters and substitution to fix the double letters.

You can of course decide for yourself what characters you will allow and disallow.

--
Regards,
Helgi Briem
helgi AT decode DOT is

Replies are listed 'Best First'.
Re2: Foreign language characters...
by blakem (Monsignor) on Oct 09, 2002 at 04:27 UTC
    The replacements are good, but the interface is a bit murky. It returns the "fixed" list, but it also modifies the original list. For instance, the output of your snippet is the same if you simply:
    fix_chars(@names);
    instead of:
    my @names = fix_chars(@names);
    I would rewrite it so that it either left the original list intact, or didn't return the "fixed" list. Something like:
    sub fix_chars { my @fixed = @_; for (@fixed) { tr/ÁÐÉÍÓÚÝÖáðéíóúýö/ADEIOUYOadeiouyo/; ... } return @fixed; }

    -Blake