I haven't had to do "special" sorts very often and I was quite pleased with the end result on this one- I think it is the most readable/understandable, whilst eliminating testing for the same thing (or it's negation) more than once and the temporary variable. I am sure some (or all) of the steps could have been skipped with experience, the answer looks obvious to me now but wasn't at the beginning.

# sort things containing a colon after things without # fist attempt if ( $a =~ /:/ ){ return $a cmp $b if $b =~ /:/; return 1; } if ( $b =~ /:/ ){ return -1; } $a cmp $b; # version 2, only one $a cmp $b return 1 if ( $a =~ /:/ and $b !~ /:/ ); return -1 if ( $b =~ /:/ and $a !~ /:/ ); $a cmp $b; # version 3 with input from a colleague, only apply a re to # each var once now if ( my $x = ($a =~ /:/) <=> ($b =~ /:/) ){ return $x; } $a cmp $b; # maybe better maybe not but it allowed me to see the # final option- my $x = ($a =~ /:/) <=> ($b =~ /:/); $x ? $x : $a cmp $b; # a good nights sleep and... (($a =~ /:/) <=> ($b =~ /:/)) || $a cmp $b;

--
Do not seek to follow in the footsteps of the wise. Seek what they sought. -Basho

Replies are listed 'Best First'.
Re: Genesis of a sort routine
by edan (Curate) on Nov 06, 2003 at 08:53 UTC

    You really don't need a regular expression, and it's a bit expensive if you're sorting a largish array. Perhaps look at perlfunc:index, as in:

    # $a =~ /:/ index($a, ":") >= 0

    Not quite as pretty, but I imagine a bit faster...

    --
    3dan

      If you want to be faster, it's better to eliminate the sort block, instead of making the block slightly faster. Something like the following ought to work (untested):
      my @sorted = map {substr $_ => 1} sort map {/:/ ? "1$_" : "0$_"} @unsorted;

      Abigail

        Here's a benchmark to backup the claim. Replacing the regex in the map with index gives even a slightly better result, but that effect is only minimal compared to eliminating the sort block (linear vs n log n).
        #!/usr/bin/perl use strict; use warnings; use Benchmark qw /timethese cmpthese/; my $elements = 1_000; my $colon = 50; our @array = map {$_ = crypt $_, sprintf "%02x" => rand 256; substr $_, int rand length, 1, ":" if $colon < rand +100; $_} 1 .. $elements; our (@green, @dan, @abi1, @abi2); cmpthese -10 => { greenFox => '@green = sort {(($a =~ /:/) <=> ($b =~ /:/)) || $a cm +p $b} @array', '3dan' => '@dan = sort {((index ($a, ":") >= 0) <=> (index ($b, ":") >= 0)) || $a cmp $b} + @array', abigail1 => '@abi1 = map {substr $_ => 1} sort map {/:/ ? "1$_" : "0$_"} @array', abigail2 => '@abi2 = map {substr $_ => 1} sort map {index ($_, ":") >= 0 ? "1$_" : "0$_"} +@array', }; warn '@green != @dan', "\n" if "@green" ne "@dan"; warn '@green != @abi1', "\n" if "@green" ne "@abi1"; warn '@green != @abi2', "\n" if "@green" ne "@abi2"; warn '@dan != @abi1', "\n" if "@dan" ne "@abi1"; warn '@dan != @abi2', "\n" if "@dan" ne "@abi2"; warn '@abi1 != @abi2', "\n" if "@abi1" ne "@abi2"; __END__ Rate greenFox 3dan abigail1 abigail2 greenFox 109/s -- -16% -54% -56% 3dan 130/s 19% -- -45% -47% abigail1 236/s 117% 82% -- -4% abigail2 246/s 126% 89% 4% --
        Abigail

        Combining your method with index instead of regex and it goes quicker still.

        Updated: Right conclusions, wrong evidence. Ignore this the tests are bad.

        Updated: I extended the tests without testing the tests! D'oh.

      The two extra comparisons,  >= 0 cost you 10%.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      Hooray!
      Wanted!

        Dunno, in my benchmark (performed after I wrote the node, of course), they seem to come out pretty even, so I'd be likely to go with the 'optimized regex' answer given by the AM. Did you have different results? Maybe my benchmark wasn't very good...

        --
        3dan

      I think perl's re's are pretty well optimized for fixed string searches, so I would expect minimal gains from using index in this case.

Re: Genesis of a sort routine
by Anonymous Monk on Nov 06, 2003 at 07:34 UTC

    Interesting post, thanks for including the emergence as well as the finished product. My first thought was rather different: I'd partition the data, and recombine the results of simple separate sorts:

    # simple and effective: @sorted = (sort(grep!/:/,@data), sort(grep/:/,@data)); # or to avoid grepping @data twice: (probably not much gain) my (@colo, @noco); /:/ ? push @colo, $_ : push @noco, $_ for @data; @sorted = (sort(@noco), sort(@colo));
Re: Genesis of a sort routine
by RMGir (Prior) on Nov 06, 2003 at 15:21 UTC
    Interesting process, thanks.

    Here's a good checklist to keep in mind when writing sort routines. It's intended for C++ or .NET, but it applies to perl as well.

    I think all of your variants are ok, but it's a good list to have handy just in case you need to figure out why sort is core dumping :)


    Mike

      Nice link, thanks.

      Interesting to see the other approaches in the "think outside of the box category" -or at least outside the box I was in :-) The data set is small enough (10-15 entries max) that efficiency is not a real issue but it is good to be aware of other techniques for later. My thanks to every-one who responded.

      --
      Do not seek to follow in the footsteps of the wise. Seek what they sought. -Basho

Re: Genesis of a sort routine
by Roger (Parson) on Nov 06, 2003 at 06:04 UTC
    This is great stuff. I will ++ this node and add it to my library of perl idioms.