in reply to Re^4: The Björk Situation
in thread The Björk Situation

More quibbling ;)

http://en.wikipedia.org/wiki/Eth_(letter) says "the letter had its origin as a d with a cross-stroke added". I don't think d is such a bad transliteration then.

In my view, it's the thorn (þ) that should become th. And in fact, Text::Unidecode does so.

I do agree with you though that all these transliterations lose information. But that makes them well suited for internal representations, especially in text searches.

Another advantage of Text::Unidecode is that it handles a lot more than what's in the Latin-1 supplement. This quote from the perldoc describes it best: "In other words, Unidecode's approach is broad (knowing about dozens of writing systems), but shallow (not being meticulous about any of them).".

So for speed and generality, I'd recommend it. If you need precision, than transliteration may not be such a good idea altogether.

Replies are listed 'Best First'.
Re^6: The Björk Situation
by japhy (Canon) on Feb 15, 2006 at 22:03 UTC
    Re-read that wikipedia entry, though: Ð and þ were replaced with th. Besides, "eth" represents the hard "th" sound (in "them") while "thorn" represents the soft "th" sound (in "thin").

    Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
    How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
      That both eth and thorn were replaced with th in english by the Normans doesn't change the fact that the eth is the voiced version, while the thorn is the silent version. This distinction is still visible in the IPA symbol.
      I believe this distinction also shows in the hip spellings of "the" and "that" by "da" and "dat". I'd say that rappers would vote for Unidecode's decision ;)