Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^2: unicode [A-F] equivalent?

by qq (Hermit)
on Mar 22, 2005 at 14:07 UTC ( [id://441458]=note: print w/replies, xml ) Need Help??


in reply to Re: unicode [A-F] equivalent?
in thread unicode [A-F] equivalent?

It would not scale at all well, I agree. Luckily I'm not creating a multi-lingual dictionary, but organizing a list of english language radio show names. Occaisionally an accented character will come through, but anything else will be a surprise.

I did once work on an international Who's Who book. The ordering of names was "solved" by having romanized equivalents. But it was the editor's job to decide the order, not the mine.

Replies are listed 'Best First'.
Re^3: unicode [A-F] equivalent?
by Anonymous Monk on Mar 22, 2005 at 14:59 UTC
    Even the accented letters are a problem. Accented letters often come from Western or Nothern European countries. Which all use the ISO LATIN-1 alphabet. But while an accented letter may look the same in different countries, they are different. In some languages, an accent just means the letter is pronounced differently, but it's still the same letter. But the same accent can change the letter in a different language. Which will become a different letter. And even if you have two languages who use the same accented letter, it doesn't necessarely mean they the letters sort the same.

    Which is why we have locales. And which means that whatever solution you will pick - there are people that will be surprised.

    If only we all spoke (and wrote) Egyptian hieroglyphs, we would have this mess.

      I do agree with you in the general case. But honestly, guv, there are extenuating circumstances here. Its a very low user admin interface, dealing with content that is basically english. The existing interface groups by english alphabetic ranges and would have silently dropped items that did not fit. I did suggest alternatives to the UI team, but they prefered to keep the existing interface. Regardless...

      I would like to hear better approaches, and minimize surprise to users. The problem splits into (at least) two parts: a) sorting a list that may contain non-english words. And b) grouping said list into groups that map somewhat to english character ranges. We can assume that the target audience is English speaking. Both of these may be basically impossible to do correctly for all cases. So whats the least surprising behaviour?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://441458]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (5)
As of 2024-04-23 11:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found