Outaspace has asked for the wisdom of the Perl Monks concerning the following question:

Hello fellow monks,

I was just wondering if there is an fast way to check if a sort has really changed an array (fast means without comparing the array before and after) or if all values in the array where all ready in the sorted order. All ways i came up with are slow (if the array is big).

Andre

Replies are listed 'Best First'.
Re: Fast way to check if a sort was doen
by archfool (Monk) on Jun 28, 2007 at 15:25 UTC
    Couldn't you make a flag variable and define a sort block? Concept:
    $changed = 0; my @array = sort { my $result = ($a cmp $b); $changed = 1 if $result==1; # Only set changed if $a > $b $result } @oldarray;
      I like this idea, but I couldn't get it to work.
      my @oldarray = (0..9); my @array = sort { my $result = ($a cmp $b); print "$result "; $result } @oldarray; print "\n@array\n";
      printed these results:
      -1 -1 -1 -1 -1 -1 1 -1 1 1 1 -1 1 -1 1 1 1 1 1 0 1 2 3 4 5 6 7 8 9
      Without knowing exactly how the sorting algorithm is implemented, I don't see how we could reliably use this kind of method.

      If OP wants to write a sorting function, they can include a flag... otherwise, this problem seems pretty solidly O(N) to me.

      ~dewey
Re: Fast way to check if a sort was doen
by syphilis (Archbishop) on Jun 28, 2007 at 15:24 UTC
    Hi,
    So you've done something like @sorted = sort @array;

    How would it be if you did a Digest::MD5::md5("@sorted"); and a Digest::MD5::md5("@array"); and checked the returned strings for equivalence ?

    Cheers,
    Rob
      I believe this would still be O(2n). Even if Digest::MD5::md5 is really written in C, so it might be FASTER, but it's the same amount of work. i.e. you are still running through each element of both arrays.
        But I think it is the fastest way. If I would test for a change in the sort sub, there would be maybe more comparisons. I think I use this because it also works for "reverse sort @array".

        Thanks,
        Andre
Re: Fast way to check if a sort was doen
by jZed (Prior) on Jun 28, 2007 at 15:30 UTC
    Insert a "Z" at the front of the array and if it's still at the front, the sort wasn't done.

    update : :-)

      I like this sentiment. Basically, you iterate the array until you find the first unsorted element (in this case it's the first item), and then trigger the sort. If you don't want to do that, then use a flag that is set any time the array is sorted. Some sanity could be gained if all items were only ever added to either the top XOR the bottom of the array.

      My final thought is perhaps you should use a sort like the 'Insertion Sort'. It will, on the average cases, deal with already sorted arrays at O(N).

      Good luck!

      Kurt

      My read of the original question is that he's not asking "how do I tell if the array has been sorted yet?" but rather "I just sorted an array; how can I tell whether the order was changed?" Inserting a non-sorted element doesn't really apply to the question that I think was asked, other than by providing the trivial answer of "you now know that it was changed because you deliberately made it unsorted first".

      As for an actual answer, that'd take some benchmarking of different detection methods to be sure what's fastest, but my money's on the MD5 solution, especially for large arrays.

      This would only work with "sort @array" but not with "reverse sort @array"
      hmmmm
      my @array = qw( Zygoma Zebra Zulu );
        I was going to argue and say that you shouldn't be overliteral. Obviously you can translate what he said to "insert some identifiable token". It doesn't matter what the sentinel token is, but selecting one does require a bit of knowledge of the input domain or the input set.

        In general, though, I'll assume that you meant that in the spirit of saying that there's no a priori way to verify a sorted set that's faster than O(n), and that there's no sentinel token value that exists which would be useful for in all possible input domains using a posteriori validation.

        --
        [ e d @ h a l l e y . c c ]

Re: Fast way to check if a sort was doen
by lima1 (Curate) on Jun 28, 2007 at 16:20 UTC
    Do you have two arrays, before and after sorting (yeah, I read that you don't want to compare the arrays...)? If yes, then for big arrays, you could write a simple test that compares n indices:
    for my $r (@rand) { return 0 if $a[$r] ne $b[$r]; } return 1;
    Not 100% safe of course, but if the input lines are not related, it should work. To be sure, you could apply a second test if the first test returned 1: Array::Compare uses join to compare two arrays. (This is probably fast enough even without the first test.)

    If you don't have the two arrays, you won't get better than O(n) or you use a similar (unsafe) test like above - check if n blocks of size 2 are sorted.

    UPDATE: md5 seems to be a little bit faster than join:

    #!/usr/bin/perl use strict; use warnings; use Benchmark qw(:all) ; use List::Util qw(shuffle); use Digest::MD5 qw(md5); open my $JOHN, '<', 'dictionary' or die $!; my @array = <$JOHN>; my @sorted = sort @array; $a = \@array; $b = \@sorted; timethese(5000, { 'join' => sub { return join('|',@$a) eq join('|', @$b); }, 'md5' => sub { return Digest::MD5::md5(@$a) eq Digest::MD5::md5(@$b) +; }, });
    perl cmparray.pl Benchmark: timing 5000 iterations of join, md5... join: 29 wallclock secs (27.91 usr + 0.18 sys = 28.09 CPU) @ 17 +8.00/s (n=5000) md5: 27 wallclock secs (25.20 usr + 0.19 sys = 25.39 CPU) @ 19 +6.93/s (n=5000)
      On my machine it's the opposite. Join is faster. I tried several array sizes (sorted and unsorted arrays) with random numbers. I have Windows laptop with Activestate Perl build 819 (v5.8.8).
      Benchmark: timing 1000 iterations of join, md5... join: 46 wallclock secs (43.51 usr + 0.07 sys = 43.58 CPU) @ 22 +.95/s (n=1000) md5: 55 wallclock secs (53.15 usr + 0.03 sys = 53.18 CPU) @ 18 +.81/s (n=1000)
Re: Fast way to check if a sort was doen
by BrowserUk (Patriarch) on Jun 28, 2007 at 22:47 UTC

    It surely has to be faster to do a short-circuiting, worst-case O(N) pass to determine if the sort is necessary before performing an O(N log N ) sort, than to perform the uneeded sort and then perform an O(2+a bit N) pass to see if it was necessary.

    The advantage will vary with how often the array needs sorting, but with a simple 50:50 split, and the pre-check done in perl, it seems to be at least 1/3 faster:

    #! perl -slw use strict; use List::Util qw[ shuffle ]; use Benchmark qw[ cmpthese ]; use Digest::MD5 qw[ md5 ]; open my $fh, '<', 'words' or die $!; my @sorted = <$fh>; close $fh; my @shuffled = shuffle @sorted; cmpthese( 10, { join => sub{ my @resorted1 = sort @sorted; my $r1 = join( '|', @resorted1 ) eq join( '|', @sorted ); my @resorted2 = sort @shuffled; my $r2 = join( '|', @resorted2 ) eq join( '|', @shuffled ); }, md5 => sub{ my @resorted1 = sort @sorted; my $r1 = md5( @resorted1 ) eq md5( @sorted ); my @resorted2 = sort @shuffled; my $r2 = md5( @resorted2 ) eq md5( @shuffled ); }, CheckFirst => sub{ my $sortNeeded = 0; for( 1 .. $#sorted ){ $sorted[ $_ -1 ] gt $sorted[ $_ ] and $sortNeeded = 1 and last } my @resorted1 = sort @sorted if $sortNeeded; $sortNeeded = 0; for( 1 .. $#shuffled ){ $shuffled[ $_ -1 ] gt $shuffled[ $_ ] and $sortNeeded = 1 and last } my @resorted2 = sort @shuffled if $sortNeeded; }, }); __END__ C:\test>junk7 Rate join md5 CheckFirst join 1.03/s -- -1% -25% md5 1.04/s 1% -- -25% CheckFirst 1.38/s 33% 32% -- C:\test>junk7 Rate join md5 CheckFirst join 1.03/s -- -0% -25% md5 1.04/s 0% -- -25% CheckFirst 1.38/s 34% 33% -- C:\test>junk7 Rate join md5 CheckFirst join 1.03/s -- -2% -25% md5 1.05/s 2% -- -24% CheckFirst 1.37/s 34% 31% --

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Fast way to check if a sort was doen
by wind (Priest) on Jun 28, 2007 at 20:35 UTC
    The simple O(n) solution:
    use Test; use strict; INIT { plan tests => 5 } sub issorted(&@) { my $code = shift; local $a, $b; for my $i (0..$#_-1) { ($a, $b) = @_[$i, $i+1]; &{$code}() <= 0 || return 0; } 1; } ok(issorted {$a cmp $b} ('a'..'z')); ok(issorted {$b cmp $a} reverse ('a'..'z')); ok(! issorted {$a <=> $b} (1..9,3,10)); ok(issorted {$a <=> $b} (1..10)); ok(issorted {$a cmp $b} (1,1,1,1,1)); ok(issorted {$a <=> $b} ()); ok(issorted {$a <=> $b} (1)); ok(issorted {$a <=> $b} (1,2)); ok(!issorted {$a <=> $b} (2,1));
    Update 20070629: Added tests for low element count boundary condition. Already passes.

    - Miller
Re: Fast way to check if a sort was doen
by jbert (Priest) on Jun 28, 2007 at 16:07 UTC
    It's might not be fast, but you could tie the array and proxy all methods straight through to the normal behaviour apart from STORE, which would just set a flag which you could test.

    There'll be some overhead, but it might be less than a custom sort routine, I'm not sure.

    See Tie::Array for documentation on how to go about this, I think you just need to inherit from Tie::StdArray and overload STORE.

Re: Fast way to check if a sort was done (using stringification is *broken*)
by bobf (Monsignor) on Jun 29, 2007 at 02:29 UTC

    Stringifying the array can lead to incorrect results in some cases. Consider the following:

    use strict; use warnings; use Digest::MD5 qw( md5 ); my @a1 = ( 'aa', 'a', '' ); print '[', join( '|', @a1 ), "]\n"; my $d1 = md5( @a1 ); my @sorted_a1 = sort @a1; print '[', join( '|', @sorted_a1 ), "]\n"; my $sorted_d1 = md5( @sorted_a1 ); printf( "Digests are %s\n", ( $d1 eq $sorted_d1 ) ? 'equal' : 'unequal' );
    Prints:
    [aa|a|] [|a|aa] Digests are equal

    In this case, the elements in the array were completely reversed by the sort but the md5 digest was identical.

    This is (partly) documented in the docs for Digest::MD5:

    The result of md5("a", "b", "c") will be exactly the same as the result of md5("abc").

      my $signature = md5(pack('(J/a*)*', @array));

      should do the trick quickly.

      Update: Of course, md5 really doesn't add anything. You could just use

      my $signature = pack('(J/a*)*', @array);

        Nice solution! I didn't know about '/' in pack.

        It's not quite perfect, though. It treats the empty string and undef as equal (there may be more ways to break it - this is just the first one I thought of):

        my $signature_1 = pack( '(J/a*)*', '', undef ); my $signature_2 = pack( '(J/a*)*', undef, '' ); printf( "Signatures are %s\n", ( $signature_1 eq $signature_2 ) ? 'equal' : 'unequal' ); # eq +ual!

        These cases were not specified in the OP. I apologize for being pedantic. :-)

Re: Fast way to check if a sort was doen
by FunkyMonk (Bishop) on Jun 28, 2007 at 22:37 UTC
    There's been some interesting suggestions by my fellow Monks that will work in all cases, specifically when you don't have control over access to the array (eg being changed by a module).

    However, if you have got control over access, wrap it all up as a class:

    package My::Array::Sorted; my $changed = 0; sub new { my ( $class, @data ) = @_; my $self = bless [], $class; $self->push( @data ); return $self; } sub sort { my $self = shift; @$self = sort @$self if $changed; $changed = 0; } sub push { my $self = shift; push @$self, @_; $changed = 1; } sub change { my ( $self, $index, $value ) = @_; $self->[$index] = $value; $changed = 1; } sub show { my $self = shift; print join( ", ", $self->get() ), "\n"; } package main; my $array = My::Array::Sorted->new( 5, 4, 3, 2, 1 ); $array->show(); $array->push( 10, 12, 14, 13, 11 ); $array->show(); $array->change( 3, 55 ); $array->show();

    You might have to change or add some methods; it all depends on your intended use.