in reply to Problem matching non-english chars
In short, you strip all accents from user entered search terms and database entries (at least the database entries that are used for searching, leaving the "display" database entries alone). i.e. map Ä to A, å to a, etc....
So if someone tries to look up "Äbc" in your file, it would be converted to "Abc" before it tries to match on the file. Professionally I work with a lot of bands and find that that conversion is very handy. For example, most people will spell Moxy Früvous as "Moxy Fruvous" (without the umlaut over the u) when performing a search, so unless I did the conversion to both store and clean-down search terms by removing accents I would never find it.
Now this is my very nassty looking translation statment that will replace accented characters (iso-8859-1) with their non-accented ascii equivalents.
As I said.. dirty and imperfect, but quick and generally works right....$s=~tr/\xC0\xC1\xC2\xC3\xC4\xC5\xC6\xC7\xC8\xC9\xCA\xCB\xCC\xCD\xCE\ +xCF\xD0\xD1\xD2\xD3\xD4\xD5\xD6\xD8\xD9\xDA\xDB\xDC\xDD\xDF\xE0\xE1\x +E2\xE3\xE4\xE5\xE6\xE7\xE8\xE9\xEA\xEB\xEC\xED\xEE\xEF\xF1\xF2\xF3\xF +4\xF5\xF6\xF8\xF9\xFA\xFB\xFC\xFD\xFF/\x41\x41\x41\x41\x41\x41\x41\x4 +3\x45\x45\x45\x45\x49\x49\x49\x49\x44\x4E\x4F\x4F\x4F\x4F\x4F\x4F\x55 +\x55\x55\x55\x59\x73\x61\x61\x61\x61\x61\x61\x61\x63\x65\x65\x65\x65\ +x69\x69\x69\x69\x6E\x6F\x6F\x6F\x6F\x6F\x6F\x75\x75\x75\x75\x79\x79/;
Les Howard
www.lesandchris.com
Author of Net::Syslog and Number::Spell
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
RE: Re: Problem matching non-english chars
by Guano (Initiate) on Apr 20, 2000 at 11:21 UTC |