Re^2: unicode [A-F] equivalent?

It would not scale at all well, I agree. Luckily I'm not creating a multi-lingual dictionary, but organizing a list of english language radio show names. Occaisionally an accented character will come through, but anything else will be a surprise.

I did once work on an international Who's Who book. The ordering of names was "solved" by having romanized equivalents. But it was the editor's job to decide the order, not the mine.

Comment on Re^2: unicode [A-F] equivalent?

Replies are listed 'Best First'.
Re^3: unicode [A-F] equivalent? by Anonymous Monk on Mar 22, 2005 at 14:59 UTC
Even the accented letters are a problem. Accented letters often come from Western or Nothern European countries. Which all use the ISO LATIN-1 alphabet. But while an accented letter may look the same in different countries, they are different. In some languages, an accent just means the letter is pronounced differently, but it's still the same letter. But the same accent can change the letter in a different language. Which will become a different letter. And even if you have two languages who use the same accented letter, it doesn't necessarely mean they the letters sort the same. Which is why we have locales. And which means that whatever solution you will pick - there are people that will be surprised. If only we all spoke (and wrote) Egyptian hieroglyphs, we would have this mess.	[reply]
Re^4: unicode [A-F] equivalent? by qq (Hermit) on Mar 22, 2005 at 15:43 UTC
I do agree with you in the general case. But honestly, guv, there are extenuating circumstances here. Its a very low user admin interface, dealing with content that is basically english. The existing interface groups by english alphabetic ranges and would have silently dropped items that did not fit. I did suggest alternatives to the UI team, but they prefered to keep the existing interface. Regardless... I would like to hear better approaches, and minimize surprise to users. The problem splits into (at least) two parts: a) sorting a list that may contain non-english words. And b) grouping said list into groups that map somewhat to english character ranges. We can assume that the target audience is English speaking. Both of these may be basically impossible to do correctly for all cases. So whats the least surprising behaviour?	[reply]


Problems? Is your data what you think it is?
	PerlMonks