Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

sort of misunderstanding of sort

by Discipulus (Canon)
on Mar 21, 2023 at 10:39 UTC ( [id://11151091] : perlmeditation . print w/replies, xml ) Need Help??

Fellow ones,

after so many years I still lack the understanding of basics.

I always thought of sort as if it should check every element of a list against any other element: this is not true. Infact rereading the sort docs, the very bottom line, I read: is implemented via the mergesort algorithm

Mergesort is based on old Latin motto divide et impera and because of this not every combination is take in count while sorting. It is optimized to compute only the lowest number of checks.

Let clean the field: sort is not like map infact:

  • perl -we "sort @ARGV" 1 2 gives Useless use of sort in void context at -e line 1
  • perl -we "$x = sort @ARGV" 1 2 gives Useless use of sort in scalar context at -e line 1
So must to be used only in list context, it seems wise unless if you want to hack it: in my Re: Perl Weekly Challenge 206 -- oneliner I used sort to push results in @r but, with my usual try&error, I found I need to use as return value for sort the subtraction $a-$b infact returning 0 failed the test with 10:10 09:30 09:00 09:55 as arguments. Using 1 as return value failed with 01:01 00:50 00:57 and returning -1 failed the test with 10:10 09:30 09:00 09:55

Why?

In the docs I read also subroutine that returns an integer less than, equal to, or greater than 0 so I tried to see what happens forcing the returned value:

perl -le "@a = sort{ print qq($a $b); 0 }@ARGV" 1 2 3 4 1 2 3 4 1 3 3 2 perl -le "@a = sort{ print qq($a $b); 1 }@ARGV" 1 2 3 4 1 2 3 4 2 4 2 3 perl -le "@a = sort{ print qq($a $b); -1 }@ARGV" 1 2 3 4 1 2 3 4 1 3 3 2 2 4 # one iterati +on more with -1

In the last example there is an iteration more: why? I'm fooling the algorithm telling $b is always less than $a or what?

Also the only missing couple frome the last example is 1 4 can I force sort to be unoptimized as I wanted in my oneliner example?

Beside Very nice title but: hey kid! Stop messing with sort code! do you have something wise to share?

L*

There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

Replies are listed 'Best First'.
Re: sort of misunderstanding of sort
by hv (Prior) on Mar 21, 2023 at 11:35 UTC

    Generally, sort functions are designed to do sorting as efficiently as possible, typically in O(n log(n)) comparisons. I'm not aware of any way to force perl's sort to do all n(n - 1) / 2 comparisons: if the purpose is not sorting, best not use sort() to do it.

    The results from the comparator are intended to describe an ordering on the elements: compare(A, B) < 0 asserts that A < B, etc. There is an expectation of consistency here, so we expect identities like these to hold:

    A = A A = B <=> B = A A < B <=> B > A A <= B <=> B >= A (A < B) & (B < C) => A < C (A <= B) & (B <= C) => A <= C

    Note this important paragraph skulking near the end of perldoc -f sort:

    The comparison function is required to behave. If it retur +ns inconsistent results (sometimes saying $x[1] is less than +$x[2] and sometimes saying the opposite, for example) the result +s are not well-defined.

    For a bit more about what can go wrong, see the accepted response on this related C++ question: What will std::sort do if the comparison is inconsistent?. I'm not aware that perl's implementation is at risk of crashing, but you are certainly entitled to get weird results.

Re: sort of misunderstanding of sort
by LanX (Saint) on Mar 21, 2023 at 11:38 UTC
    I assume you want the cross-product of a list with itself.

    sort can't do this because it's designed to be fast with O(n*log(n))

    There used to be a way to change the applied algorithm, but that's deprecated now https://perldoc.perl.org/sort

    Cheers Rolf
    (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
    Wikisyntax for the Monastery

      > I assume you want the cross-product

      DB<34> sub cross (&$$) { my ($code,$list1,$list2) = @_; local ($a,$b +); my @res; for $a (@$list1) { for $b (@$list2) { push @res, &$code } +}; return @res; } DB<35> x cross { $a + $b } [0..2], [10,20,30] 0 10 1 20 2 30 3 11 4 21 5 31 6 12 7 22 8 32 DB<36>

      you can just let $list2 default to $list1 if undef to have a self-operation with only one list.

      $list2 //= $list1

      Cheers Rolf
      (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
      Wikisyntax for the Monastery

Re: sort of misunderstanding of sort
by ikegami (Patriarch) on Mar 21, 2023 at 21:19 UTC

    Merge Sort performs a constant number of comparison for an input of a given size. (There might be some variations in practice since the smaller subdivisions are usually sorted using something other than merge sort.)

    Returning garbage values doesn't not increase the number of comparisons; it just cause the the output to be garbage.

    If you wanted to increase the number of comparisons, use an older version of Perl that supports using Quicksort. See the sort pragma.

      Thanks ikegami and all who answered,

      while now I understand better the matter of sorting, your sentence

      > Returning garbage values doesn't not increase the number of comparisons

      does not seems to be confirmed by my last example: perl -le "@a = sort{ print qq($a $b); -1 }@ARGV" 1 2 3 4

      Infact returning garbage ie: -1 seems to produce always an iteration more in respect of returning 0 or 1

      If I understand the mergesort, the above example, should go like:

      1 2 3 4 [1 2] [3 4] [1 3] [3 2] [2 4]

      So only [1 4] is skipped because derived by previous [1 2] [2 4] To notice also that returning -1 as garbage returns a weird sorted list 1 3 2 4 while returning always 1 produce (as I expected) a reversed list: 4 3 2 1

      As side effect hacking sort can lead to a new shuffle :)

      # naive shuffle perl -le "print for sort { int rand(3) - 1 } @ARGV" 1 2 3 4 5 1 3 5 2 4

      L*

      There are no rules, there are no thumbs..
      Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

        I did mention there are optimizations at play over a pure merge sort.

        In fact, my testing shows that passing n sorted items, it only performs n comparisons. This is clearly not the behaviour of a pure merge sort.

        But I was indeed wrong about the amount being constant.

        • Merging 1,2,3 with 4,5,6 requires three comparisons.
        • Merging 1,7,8 with 4,5,6 requires four comparisons.
        Just for fun..

        obfuscated shuffle can be sort {--$|} but, as expected, List::Util shuffle is a bit more performant

        use strict; use warnings; use Benchmark qw(cmpthese); use List::Util qw(shuffle); my $count = 10000; my @numbers = 0..10000; cmpthese($count, { 'List::Util::shuffle' => sub { my @shuffled = shuffle @numbers }, 'Discipulus::shuffle' => sub { my @shuffled = sort { int rand(3) - + 1 } @numbers }, 'Discipulus::--$|fle' => sub { my @shuffled = sort { --$| } @numbe +rs }, }); __END__ Rate Discipulus::--$|fle Discipulus::shuffle Lis +t::Util::shuffle Discipulus::--$|fle 164/s -- -52% + -95% Discipulus::shuffle 342/s 108% -- + -89% List::Util::shuffle 3107/s 1793% 809% + --

        L*

        There are no rules, there are no thumbs..
        Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Re: sort of misunderstanding of sort
by eyepopslikeamosquito (Archbishop) on Mar 22, 2023 at 01:10 UTC

    My recent experiences during Rosetta Code: Long List is Long taught me that vanilla sorting in Perl tends to be slow.

    Even Perl experts were surprised. I was similarly shocked when changing from one sort to two (sort { $href->{$b} <=> $href->{$a} || $a cmp $b } keys %{$href} to sort { $href->{$b} <=> $href->{$a} } sort keys %{$href}) gave a massive performance boost ... and nearly fell off my chair when I later accidentally out-mario-royed marioroy via GRT. :)

    I also learnt that stable sorting algorithms tend to run significantly slower than unstable ones in practice. Yet I don't see Perl supporting unstable sorting algorithms in the near future given:

    From sort - perl pragma to control sort() behaviour "The sort pragma is now a no-op, and its use is discouraged ... The default sort has been stable since v5.8.0, and given this consistent behaviour for almost two decades, everyone has come to assume stability"

    The other huge sorting performance boost (pun intended) I remember was here when Boost's sort::block_indirect_sort parallel sort algorithm, running on 8 threads simultaneously, ran massively faster than single-threaded vanilla C++ std::sort.

    I would be interested to learn of future plans to improve Perl sorting performance.

    See also my mandatory list of Sorting References.

Re: sort of misunderstanding of sort
by bliako (Monsignor) on Mar 21, 2023 at 12:56 UTC
    one iteration more with -1

    basically it's like a logical puzzle you have to solve with asking a *minimum* number of questions (like "is 1 less than 2?") and (ideally) employing heuristics and the identities that hv mentions. In fact Perl's sort does a good job: perl -le 'my $iters;@a=sort{ $iters++; $a<=>$b }map { int rand 50000 }(1..500); print $iters' 3852 whereas an exhaustive sort would ask 500*499/2=124750 questions. So Do What I Mean and Don't @&*%$ Ask Me More Questions Than Necessary hehe

    Here is another way to irritate Perl's sort: perl -le 'my$iters;@a=sort{ $iters++;print qq($a $b); [-1,0,1]->[int rand 3] }map { int rand 50000 }(1..500); print $iters'

    Edit: see Acme::ICan for another kind of sort

Re: sort of misunderstanding of sort
by ikegami (Patriarch) on Mar 21, 2023 at 20:57 UTC

    [Woops, this doesn't answer the question.]

    Checking every single pairing would be wasteful. If we know that a < b and that b < c, then we know that a < c. No need to check that one.

    There are algorithms that effectively check every pairing in some circumstances, including Bubble Sort and so-called Quicksort.

    Perl used to use Quicksort because it can be fast outside of the degenerate cases. But the degenerate case happens to be an already-sorted list, which is a common thing to sort.

    Perl now uses Merge Sort, a much better algorithm. It always performs n log2 n comparisons for an input of size n, which is far less than the n2 approach of checking every pairing. (And it's also stable, meaning that items that compare equally will maintain their relative order. This is a very useful property.)

    I linked to the Wikipedia pages of algorithms I mentioned, and they are great. They even include useful animations of the sorting algorithm in action.

Re: sort of misunderstanding of sort
by Anonymous Monk on Mar 21, 2023 at 12:20 UTC

    I think values which end to be adjacent in sorted list are guaranteed to be compared, and so your technique works OK. Usual spaceship operator, as well as through-away result with map sort... or 1 for sort... will do, but are longer to type. And in fact -- don't push and then sort again, but update current minimum so far, as LanX did in his question.