In short, you strip all accents from user entered search terms and database entries (at least the database entries that are used for searching, leaving the "display" database entries alone). i.e. map Ä to A, å to a, etc....
So if someone tries to look up "Äbc" in your file, it would be converted to "Abc" before it tries to match on the file. Professionally I work with a lot of bands and find that that conversion is very handy. For example, most people will spell Moxy Früvous as "Moxy Fruvous" (without the umlaut over the u) when performing a search, so unless I did the conversion to both store and clean-down search terms by removing accents I would never find it.
Now this is my very nassty looking translation statment that will replace accented characters (iso-8859-1) with their non-accented ascii equivalents.
As I said.. dirty and imperfect, but quick and generally works right....$s=~tr/\xC0\xC1\xC2\xC3\xC4\xC5\xC6\xC7\xC8\xC9\xCA\xCB\xCC\xCD\xCE\ +xCF\xD0\xD1\xD2\xD3\xD4\xD5\xD6\xD8\xD9\xDA\xDB\xDC\xDD\xDF\xE0\xE1\x +E2\xE3\xE4\xE5\xE6\xE7\xE8\xE9\xEA\xEB\xEC\xED\xEE\xEF\xF1\xF2\xF3\xF +4\xF5\xF6\xF8\xF9\xFA\xFB\xFC\xFD\xFF/\x41\x41\x41\x41\x41\x41\x41\x4 +3\x45\x45\x45\x45\x49\x49\x49\x49\x44\x4E\x4F\x4F\x4F\x4F\x4F\x4F\x55 +\x55\x55\x55\x59\x73\x61\x61\x61\x61\x61\x61\x61\x63\x65\x65\x65\x65\ +x69\x69\x69\x69\x6E\x6F\x6F\x6F\x6F\x6F\x6F\x75\x75\x75\x75\x79\x79/;
Les Howard
www.lesandchris.com
Author of Net::Syslog and Number::Spell
In reply to Re: Problem matching non-english chars
by lhoward
in thread Problem matching non-english chars
by Guano
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |