Genesis of a sort routine

I haven't had to do "special" sorts very often and I was quite pleased with the end result on this one- I think it is the most readable/understandable, whilst eliminating testing for the same thing (or it's negation) more than once and the temporary variable. I am sure some (or all) of the steps could have been skipped with experience, the answer looks obvious to me now but wasn't at the beginning.

# sort things containing a colon after things without

# fist attempt
if ( $a =~ /:/ ){
  return $a cmp $b if $b =~ /:/;
  return 1;
}
if ( $b =~ /:/ ){
  return -1;
}
$a cmp $b;

# version 2, only one $a cmp $b 
return  1 if ( $a =~ /:/ and $b !~ /:/ );
return -1 if ( $b =~ /:/ and $a !~ /:/ );
$a cmp $b;

# version 3 with input from a colleague, only apply a re to
# each var once now
if ( my $x = ($a =~ /:/) <=> ($b =~ /:/) ){ 
  return $x; 
}
$a cmp $b;

# maybe better maybe not but it allowed me to see the 
# final option-
my $x = ($a =~ /:/) <=> ($b =~ /:/);
$x ? $x : $a cmp $b;

# a good nights sleep and...
(($a =~ /:/) <=> ($b =~ /:/)) || $a cmp $b;
[download]

--
Do not seek to follow in the footsteps of the wise. Seek what they sought. -Basho

Comment on Genesis of a sort routine Download Code

Replies are listed 'Best First'.
Re: Genesis of a sort routine by edan (Curate) on Nov 06, 2003 at 08:53 UTC
You really don't need a regular expression, and it's a bit expensive if you're sorting a largish array. Perhaps look at perlfunc:index, as in: `# $a =~ /:/ index($a, ":") >= 0` [download] Not quite as pretty, but I imagine a bit faster... -- 3dan	[reply] [d/l]
Re: Genesis of a sort routine by Abigail-II (Bishop) on Nov 06, 2003 at 10:32 UTC
If you want to be faster, it's better to eliminate the sort block, instead of making the block slightly faster. Something like the following ought to work (untested): `my @sorted = map {substr $_ => 1} sort map {/:/ ? "1$_" : "0$_"} @unsorted;` [download] Abigail	[reply] [d/l]
Re: Genesis of a sort routine by Abigail-II (Bishop) on Nov 06, 2003 at 10:59 UTC
Here's a benchmark to backup the claim. Replacing the regex in the map with index gives even a slightly better result, but that effect is only minimal compared to eliminating the sort block (linear vs n log n). #!/usr/bin/perl use strict; use warnings; use Benchmark qw /timethese cmpthese/; my $elements = 1_000; my $colon = 50; our @array = map {$_ = crypt $_, sprintf "%02x" => rand 256; substr $_, int rand length, 1, ":" if $colon < rand +100; $_} 1 .. $elements; our (@green, @dan, @abi1, @abi2); cmpthese -10 => { greenFox => '@green = sort {(($a =~ /:/) <=> ($b =~ /:/)) \|\| $a cm +p $b} @array', '3dan' => '@dan = sort {((index ($a, ":") >= 0) <=> (index ($b, ":") >= 0)) \|\| $a cmp $b} + @array', abigail1 => '@abi1 = map {substr $_ => 1} sort map {/:/ ? "1$_" : "0$_"} @array', abigail2 => '@abi2 = map {substr $_ => 1} sort map {index ($_, ":") >= 0 ? "1$_" : "0$_"} +@array', }; warn '@green != @dan', "\n" if "@green" ne "@dan"; warn '@green != @abi1', "\n" if "@green" ne "@abi1"; warn '@green != @abi2', "\n" if "@green" ne "@abi2"; warn '@dan != @abi1', "\n" if "@dan" ne "@abi1"; warn '@dan != @abi2', "\n" if "@dan" ne "@abi2"; warn '@abi1 != @abi2', "\n" if "@abi1" ne "@abi2"; __END__ Rate greenFox 3dan abigail1 abigail2 greenFox 109/s -- -16% -54% -56% 3dan 130/s 19% -- -45% -47% abigail1 236/s 117% 82% -- -4% abigail2 246/s 126% 89% 4% -- [download] Abigail	[reply] [d/l]
Re: Re: Genesis of a sort routine by Anonymous Monk on Nov 06, 2003 at 14:30 UTC
Re: Re: Genesis of a sort routine by BrowserUk (Patriarch) on Nov 06, 2003 at 10:44 UTC
Combining your method with index instead of regex and it goes quicker still. Updated: Right conclusions, wrong evidence. Ignore this the tests are bad. Read more... (2 kB) Updated: I extended the tests without testing the tests! D'oh. Read more... (2 kB)	[reply] [d/l] [select]
Re: Re: Genesis of a sort routine by bart (Canon) on Nov 06, 2003 at 18:38 UTC
As for an attribution: the method applied qualifies as the Rosler-Guttman Transform. See Larry Rolser and Uri Guttman's paper on sorting in Perl and demerphq's summary in "Advanced Sorting - GRT - Guttman Rosler Transform"	[reply]
Re: Re: Genesis of a sort routine by BrowserUk (Patriarch) on Nov 06, 2003 at 09:39 UTC
The two extra comparisons, `>= 0` cost you 10%. Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail Hooray! Wanted!	[reply] [d/l]
Re: Re: Re: Genesis of a sort routine by edan (Curate) on Nov 06, 2003 at 09:50 UTC
Dunno, in my benchmark (performed after I wrote the node, of course), they seem to come out pretty even, so I'd be likely to go with the 'optimized regex' answer given by the AM. Did you have different results? Maybe my benchmark wasn't very good... -- 3dan	[reply]
Re+: Genesis of a sort routine by BrowserUk (Patriarch) on Nov 06, 2003 at 10:30 UTC
Re: Re+: Genesis of a sort routine by sauoq (Abbot) on Nov 06, 2003 at 10:58 UTC
Some notes below your chosen depth have not been shown here
Re: Re: Genesis of a sort routine by Anonymous Monk on Nov 06, 2003 at 09:01 UTC
I think perl's re's are pretty well optimized for fixed string searches, so I would expect minimal gains from using index in this case.	[reply]
Re: Genesis of a sort routine by Anonymous Monk on Nov 06, 2003 at 07:34 UTC
Interesting post, thanks for including the emergence as well as the finished product. My first thought was rather different: I'd partition the data, and recombine the results of simple separate sorts: `# simple and effective: @sorted = (sort(grep!/:/,@data), sort(grep/:/,@data)); # or to avoid grepping @data twice: (probably not much gain) my (@colo, @noco); /:/ ? push @colo, $_ : push @noco, $_ for @data; @sorted = (sort(@noco), sort(@colo));` [download]	[reply] [d/l]
Re: Genesis of a sort routine by RMGir (Prior) on Nov 06, 2003 at 15:21 UTC
Interesting process, thanks. Here's a good checklist to keep in mind when writing sort routines. It's intended for C++ or .NET, but it applies to perl as well. I think all of your variants are ok, but it's a good list to have handy just in case you need to figure out why sort is core dumping :) Mike	[reply]
Re: Re: Genesis of a sort routine by greenFox (Vicar) on Nov 07, 2003 at 06:24 UTC
Nice link, thanks. Interesting to see the other approaches in the "think outside of the box category" -or at least outside the box I was in :-) The data set is small enough (10-15 entries max) that efficiency is not a real issue but it is good to be aware of other techniques for later. My thanks to every-one who responded. -- Do not seek to follow in the footsteps of the wise. Seek what they sought. -Basho	[reply]
Re: Genesis of a sort routine by Roger (Parson) on Nov 06, 2003 at 06:04 UTC
This is great stuff. I will ++ this node and add it to my library of perl idioms.	[reply]