comment on

After a discussion in the chatterbox, with input from tye and Fastolfe, I tried a few different ways of solving a certain kind of sorting problem.

I've got a number of programs that output alphabetically sorted lists, but often there's an "other" category that we want to list after all the "real" elements in the list.

I had a solution that worked, but it looks really ugly, so I tried out two other suggestions and benchmarked those suckers.

#!/usr/bin/perl -w

use strict;
use Benchmark;

# Yeah, I'm using globals; this is to get around troubles
# I was having since Benchmark uses eval ...

use vars qw/@array %last/;

# some values to put in there : mix cases to avoid "asciibetical" sort

@array=qw(xavier colin melissa wally Edward joan Arlen George mohandas
+ ralph other); 
$last{other}=1;

sub using_hash {
    defined($last{lc($a)}) cmp defined($last{lc($b)}) or lc($a) cmp lc
+($b);
}

sub using_return {
        return 1 if lc($a) eq 'other';
        return -1 if lc($b) eq 'other';
        lc($a) cmp lc($b);
}

sub using_eq {
       ( (lc($a) eq 'other') cmp (lc($b) eq 'other') ) || lc($a) cmp l
+c($b);
}

# for gosh sakes, use a sensible value here
# I needed a large number on a 4 CPU system

timethese (100000,
    {'use_hash'=> q{

        my @sorted = sort using_hash @array;
},
     use_return=> q{ 
         my @sorted = sort using_return @array
            },
     use_eq=> q{ my @sorted = sort using_eq @array }
    }
);
[download]

The results. Surprising (to me) is that my ugly using_return benchmarked the fastest (when I added lc(); going with "ASCIIbetical" sorting made it marginally slower than using_eq) using_hash is nice because it makes it easy to change the value you want placed last on the fly, but it came in third in the speed category. Not that it's *that much* slower, however.

use_eq: 24 wallclock secs (24.27 usr +  0.00 sys = 24.27 CPU)
use_hash: 27 wallclock secs (27.68 usr +  0.00 sys = 27.68 CPU)
use_return: 23 wallclock secs (22.97 usr +  0.00 sys = 22.97 CPU)
$ ./sort_test.pl
Benchmark: timing 100000 iterations of use_eq, use_hash, use_return ..
use_eq: 24 wallclock secs (23.59 usr +  0.00 sys = 23.59 CPU)
use_hash: 28 wallclock secs (27.75 usr +  0.00 sys = 27.75 CPU)
use_return: 24 wallclock secs (22.86 usr +  0.00 sys = 22.86 CPU)
[download]

The differences *are* pretty minor, but I'd like to try them with larger lists, and also after thinking more about optimization.

Philosophy can be made out of anything. Or less -- Jerry A. Fodor

In reply to A discriminating sort (and some damned lies) by arturo

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.