in reply to Removing Unsafe Characters

Did you have a look at Text::Unidecode? It is not perfect (accented characters come out as their non-accented form), but it does a good job with the really exotic ones.

From its docs:

What Text::Unidecode provides is a function, unidecode(...) that takes Unicode data and tries to represent it in US-ASCII characters (i.e., the universally displayable characters between 0x00 and 0x7F). The representation is almost always an attempt at transliteration -- i.e., conveying, in Roman letters, the pronunciation expressed by the text in some other writing system.

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Replies are listed 'Best First'.
Re^2: Removing Unsafe Characters
by ikegami (Patriarch) on Apr 28, 2009 at 13:17 UTC

    It is not perfect (accented characters come out as their non-accented form)

    Easily fixed

    $text =~ s/(\P{Latin}+)/unidecode("$1")/ge;