in reply to Re: best sort
in thread best sort
Calling sort without a comparison function is quite often the wrong thing to do, even on plain text.
There are a couple things that sorting can do for you, but they won't necessarily happen if you supply your own comparison function.
Those of us accustomed to dealing solely with 7-bit English characters take advantage of these properties which fall out nicely with the default comparison function. For example, if I am looking for the word "macnulty" and I find "machinery" in the sorted output I know I must look later in the list, and I know that "mable" cannot appear following either. I further know that all occurrences of "machinery" appear together, so, having found "machinery" and something else, I will never again see "machinery" in the list. This is true of all prefixes of strings as well as complete strings.
When one ventures outside of 7-bit code points, as a good citizen of the world must, or into domain-specific applications, like surnames, the "proper" sort order may not preserve these properties. "machinery" may follow both "macnulty" and "mable". I might encounter "mcnulty" between two distinct occurrences of "macnulty" unless someone has been careful to tie-break nominally (an appropriate term in this case) identical surnames using something very like the default comparison function.
The comparison function is a contract, of sorts, between the producer of and the consumers of the sorted output. If the producer and consumers have different expectations, something unseemly is likely to occur. One of my first assignments was to sort white pages listings using rules that were so complex (where does "General John Smith Jr" sort relative to "Doctor John J Smith III"?) that I could be certain that the average white pages user had no clue what the rules actually were. Fortunately, most lists of names were short enough that users could spot the right name without understanding the sort order. Sorting has its place, but indexing on N-grams is proving to be a more user-friendly search mechanism.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: best sort
by tchrist (Pilgrim) on Aug 16, 2011 at 15:48 UTC | |
by jpl (Monk) on Aug 16, 2011 at 16:27 UTC | |
by tchrist (Pilgrim) on Aug 16, 2011 at 17:30 UTC | |
by jpl (Monk) on Aug 16, 2011 at 18:36 UTC |