in reply to Re^4: The Björk Situation
in thread The Björk Situation
http://en.wikipedia.org/wiki/Eth_(letter) says "the letter had its origin as a d with a cross-stroke added". I don't think d is such a bad transliteration then.
In my view, it's the thorn (þ) that should become th. And in fact, Text::Unidecode does so.
I do agree with you though that all these transliterations lose information. But that makes them well suited for internal representations, especially in text searches.
Another advantage of Text::Unidecode is that it handles a lot more than what's in the Latin-1 supplement. This quote from the perldoc describes it best: "In other words, Unidecode's approach is broad (knowing about dozens of writing systems), but shallow (not being meticulous about any of them).".
So for speed and generality, I'd recommend it. If you need precision, than transliteration may not be such a good idea altogether.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^6: The Björk Situation
by japhy (Canon) on Feb 15, 2006 at 22:03 UTC | |
by rhesa (Vicar) on Feb 15, 2006 at 22:40 UTC |