in reply to Re^2: best sort
in thread best sort

There are a couple things that sorting can do for you, but they are won't necessarily happen if you supply your own comparison function.
  • You can tell if some string you are interested would come before or after an arbitrary item in the sorted output.
  • All "identical" items will appear together.
As you say, it won’t necessarily happen, but it usually does, because you know what your comparison function is doing.
Tail:
menorrhoeic pinnisected chimaerid weighable foregone plastochrone Alf gravamen hemen preabdomen metanomen praenomen Carmen acumen tegumen bitumen prolusion Malabar watercress usheress speechcraft unshrew windingly xyzzy
Length:
Alf hemen xyzzy acumen Carmen bitumen Malabar tegumen unshrew foregone gravamen usheress chimaerid metanomen praenomen prolusion weighable windingly preabdomen watercress menorrhoeic pinnisected speechcraft plastochrone
By vowels:
Alf Malabar gravamen Carmen watercress praenomen plastochrone acumen metanomen preabdomen hemen speechcraft weighable menorrhoeic tegumen chimaerid pinnisected windingly bitumen foregone prolusion unshrew usheress xyzzy
By consonants:
bitumen chimaerid acumen Carmen foregone gravamen hemen Alf Malabar menorrhoeic metanomen unshrew plastochrone pinnisected preabdomen prolusion praenomen usheress speechcraft tegumen weighable windingly watercress xyzzy
Syllable count:
Alf Carmen foregone hemen speechcraft unshrew acumen bitumen chimaerid gravamen Malabar plastochrone praenomen prolusion tegumen usheress watercress weighable windingly metanomen preabdomen menorrhoeic pinnisected xyzzy
Vowel count:
menorrhoeic weighable metanomen praenomen preabdomen chimaerid plastochrone pinnisected foregone prolusion Malabar gravamen speechcraft watercress tegumen usheress windingly acumen bitumen hemen xyzzy Carmen unshrew Alf
Consonant density:
xyzzy windingly acumen Alf bitumen Carmen chimaerid foregone gravamen hemen Malabar menorrhoeic metanomen pinnisected plastochrone praenomen preabdomen prolusion speechcraft tegumen unshrew usheress watercress weighable
Code point summation:
Alf hemen Carmen xyzzy acumen Malabar bitumen tegumen unshrew gravamen foregone usheress chimaerid weighable metanomen praenomen windingly prolusion preabdomen watercress speechcraft pinnisected menorrhoeic plastochrone
Of those, about the only one I can’t really pretty much eyeball is the last one, because I know what I am comparing and how. If those were the things I was looking into, I would certainly not want a bare sort for any of them.

Replies are listed 'Best First'.
Re^4: best sort
by jpl (Monk) on Aug 16, 2011 at 16:27 UTC
    If those were the things I was looking into, I would certainly not want a bare sort for any of them.

    If you anticipate repeated elements, you'd probably want a tie-breaking

    $a cmp $b
    (the bare sort comparison function) on all but the first, because non-identical terms can otherwise compare equal and identical elements may not be adjacent in the sorted output. But I think we are in fundamental agreement: You need to know what you are comparing and how. That may be easier said than done. Knowing that "machenry" is a Scottish name, but "machinery" is not (or is it, and, if not, why not) makes "knowing what you are comparing" non-trivial.
      If those were the things I was looking into, I would certainly not want a bare sort for any of them.
      If you anticipate repeated elements, you'd probably want a tie-breaking
      $a cmp $b
      Well, sure. Here’s some of the code to generate one of those:
      say $_->{PHRASE} for sort {     $b->{TOTAL_VOWELS} <=>  $a->{TOTAL_VOWELS}         ||    $b->{MAX_ANY_VOWEL} <=>  $a->{MAX_ANY_VOWEL}         ||         $b->{NUM_OF_A} <=>  $a->{NUM_OF_A}         ||         $b->{NUM_OF_E} <=>  $a->{NUM_OF_E}         ||         $b->{NUM_OF_I} <=>  $a->{NUM_OF_I}         ||         $b->{NUM_OF_O} <=>  $a->{NUM_OF_O}         ||         $b->{NUM_OF_U} <=>  $a->{NUM_OF_U}         ||         $b->{NUM_OF_Y} <=>  $a->{NUM_OF_Y}         ||         $a->{DICTFOLD} cmp  $b->{DICTFOLD};         ||          $a->{RECNO} <=>  $b->{RECNO}; } @records;
      Look more reasonable?
        It helps a lot with the how I am comparing part, although if anyone guessed in advance that was how By vowels: sorted, I'd like to solicit their advice on the outcome of upcoming NFL games. It's still not exactly what I had in mind for bringing all identical terms together. The
        $a->{RECNO} <=> $b->{RECNO}
        is unnecessary if the sort is stable, as sort() is, by default. Since different words can compare equal under the influence of
        $a->{DICTFOLD} cmp $b->{DICTFOLD}
        (if I'm correctly guessing what DICTFOLD is), I still might see
        word Word word Word
        in the sorted output, when I might have preferred
        word word Word Word
        which makes it easier to determine if words are "unique" without repeating all the complicated logic. So I would prefer
        $a->{ORIGINAL} cmp $b->{ORIGINAL}
        as the final tie-breaker. That's neither better nor worse than your code, merely different.