Re^3: best sort

There are a couple things that sorting can do for you, but they are won't necessarily happen if you supply your own comparison function.
You can tell if some string you are interested would come before or after an arbitrary item in the sorted output.
All "identical" items will appear together.

As you say, it won’t necessarily happen, but it usually does, because you know what your comparison function is doing.

Tail:
menorrhoeic pinnisected chimaerid weighable foregone plastochrone Alf gravamen hemen preabdomen metanomen praenomen Carmen acumen tegumen bitumen prolusion Malabar watercress usheress speechcraft unshrew windingly xyzzy
Length:
Alf hemen xyzzy acumen Carmen bitumen Malabar tegumen unshrew foregone gravamen usheress chimaerid metanomen praenomen prolusion weighable windingly preabdomen watercress menorrhoeic pinnisected speechcraft plastochrone
By vowels:
Alf Malabar gravamen Carmen watercress praenomen plastochrone acumen metanomen preabdomen hemen speechcraft weighable menorrhoeic tegumen chimaerid pinnisected windingly bitumen foregone prolusion unshrew usheress xyzzy
By consonants:
bitumen chimaerid acumen Carmen foregone gravamen hemen Alf Malabar menorrhoeic metanomen unshrew plastochrone pinnisected preabdomen prolusion praenomen usheress speechcraft tegumen weighable windingly watercress xyzzy
Syllable count:
Alf Carmen foregone hemen speechcraft unshrew acumen bitumen chimaerid gravamen Malabar plastochrone praenomen prolusion tegumen usheress watercress weighable windingly metanomen preabdomen menorrhoeic pinnisected xyzzy
Vowel count:
menorrhoeic weighable metanomen praenomen preabdomen chimaerid plastochrone pinnisected foregone prolusion Malabar gravamen speechcraft watercress tegumen usheress windingly acumen bitumen hemen xyzzy Carmen unshrew Alf
Consonant density:
xyzzy windingly acumen Alf bitumen Carmen chimaerid foregone gravamen hemen Malabar menorrhoeic metanomen pinnisected plastochrone praenomen preabdomen prolusion speechcraft tegumen unshrew usheress watercress weighable
Code point summation:
Alf hemen Carmen xyzzy acumen Malabar bitumen tegumen unshrew gravamen foregone usheress chimaerid weighable metanomen praenomen windingly prolusion preabdomen watercress speechcraft pinnisected menorrhoeic plastochrone

Of those, about the only one I can’t really pretty much eyeball is the last one, because I know what I am comparing and how. If those were the things I was looking into, I would certainly not want a bare sort for any of them.

Comment on Re^3: best sort

Replies are listed 'Best First'.
Re^4: best sort by jpl (Monk) on Aug 16, 2011 at 16:27 UTC
If those were the things I was looking into, I would certainly not want a bare sort for any of them. If you anticipate repeated elements, you'd probably want a tie-breaking `$a cmp $b` [download] (the bare sort comparison function) on all but the first, because non-identical terms can otherwise compare equal and identical elements may not be adjacent in the sorted output. But I think we are in fundamental agreement: You need to know what you are comparing and how. That may be easier said than done. Knowing that "machenry" is a Scottish name, but "machinery" is not (or is it, and, if not, why not) makes "knowing what you are comparing" non-trivial.	[reply] [d/l]
Re^5: best sort by tchrist (Pilgrim) on Aug 16, 2011 at 17:30 UTC
If those were the things I was looking into, I would certainly not want a bare sort for any of them. If you anticipate repeated elements, you'd probably want a tie-breaking `$a cmp $b` [download] Well, sure. Here’s some of the code to generate one of those: say $_->{PHRASE} for sort { $b->{TOTAL_VOWELS} <=> $a->{TOTAL_VOWELS} \|\| $b->{MAX_ANY_VOWEL} <=> $a->{MAX_ANY_VOWEL} \|\| $b->{NUM_OF_A} <=> $a->{NUM_OF_A} \|\| $b->{NUM_OF_E} <=> $a->{NUM_OF_E} \|\| $b->{NUM_OF_I} <=> $a->{NUM_OF_I} \|\| $b->{NUM_OF_O} <=> $a->{NUM_OF_O} \|\| $b->{NUM_OF_U} <=> $a->{NUM_OF_U} \|\| $b->{NUM_OF_Y} <=> $a->{NUM_OF_Y} \|\| $a->{DICTFOLD} cmp $b->{DICTFOLD}; \|\| $a->{RECNO} <=> $b->{RECNO}; } @records; [download] Look more reasonable?	[reply] [d/l] [select]
Re^6: best sort by jpl (Monk) on Aug 16, 2011 at 18:36 UTC
It helps a lot with the how I am comparing part, although if anyone guessed in advance that was how `By vowels:` sorted, I'd like to solicit their advice on the outcome of upcoming NFL games. It's still not exactly what I had in mind for bringing all identical terms together. The `$a->{RECNO} <=> $b->{RECNO}` [download] is unnecessary if the sort is stable, as sort() is, by default. Since different words can compare equal under the influence of `$a->{DICTFOLD} cmp $b->{DICTFOLD}` [download] (if I'm correctly guessing what `DICTFOLD` is), I still might see `word Word word Word` [download] in the sorted output, when I might have preferred `word word Word Word` [download] which makes it easier to determine if words are "unique" without repeating all the complicated logic. So I would prefer `$a->{ORIGINAL} cmp $b->{ORIGINAL}` [download] as the final tie-breaker. That's neither better nor worse than your code, merely different.	[reply] [d/l] [select]