in reply to Re^6: Using $a and $b from XS
in thread Using $a and $b from XS

This last example does not work for me. For example, when I try:
my $max = 10; my @values = 1 .. $max; my @values_mixed = shuffle(@values); my @top = topNbsAB(5, sub { $a <=> $b }, \@values_mixed); print join(" ",sort { $a <=> $b } @top),"\n";

I get a result like:

1 1 1 1 9

I'm also working out how to avoid the extra creation of zero-fill items. Ideally, I would replace them with the first n items of the input in reverse-sorted order. There are two reasons for this:

1) We cannot yet handle negative numbers.

2) We should be testing for the case that the input length is less than or equal to n, and in that case returning everything they gave us. Something like:

// Special case -- too few values provided. Just return them. if (len+1 <= n) { for( i = 0; i < len; i++ ) { SV *val = *av_fetch( data, i, 0 ); PUSHs( newSVsv( val ) ); } PUTBACK; XSRETURN(len+1); }

Replies are listed 'Best First'.
Re^8: Using $a and $b from XS
by BrowserUk (Patriarch) on Feb 08, 2005 at 01:52 UTC

    If you reverse the order of $a and $b in the comparator, you'll see it works okay for the only case you have coded so far.

    To be honest, I haven't been trying to fix up the algorithms, that's your department :) I've been concentrating on how to make the callback from C to Perl.

    I'm also working out how to avoid the extra creation of zero-fill items.
    1) We cannot yet handle negative numbers.
    2) We should be testing for the case that the input length is less than or equal to n, and in that case returning everything they gave us. Something like:

    These becomes a non-issues once we fix the main problem--that of handling building the list when you don't know what the sort order will be.

    There is no useful value that we can use to initialise the list. Even undef is a legitimate value for the caller to pass us as a part of the list, and their comparator function can choose to order that in whatever way they see fit.

    Hence, the problem moves from how to initialise the list, to one of deciding when to add a new value to the list. We don't need to pre-initialise the list with a sorted list. We just build it as we go and the algorithm will take care of the sorting.

    Essentially, we just stick each new value on the end of the list until we reach the limit, and effectively ignore it's presence when doing the binary search. Once we find the legitimate position, the new, unsorted value we tacked on the end will get shifted off when we insert that same value into it's proper place.

    Harder to describe than code--though the ternary fudge on the initialisation of $right took me a while to see. It also means that we need to remember to remove the last value from the list before returning it.

    Here it is coded in Perl. I'm not ready to get back into Inline C again yet. Man, that makes you remember just why you like Perl so much :)

    #! perl -slw use strict; use List::Util qw[ shuffle ]; our $MAX ||= 20; our $N = defined $N ? $N : 5; my @values = shuffle( 1 .. $MAX ); my @top = topNbsAB_p( $N, sub { $_[ 0 ] <=> $_[ 1 ] }, \@values ); print join(" ", @top); @top = topNbsAB_p( $N, sub { $_[ 1 ] <=> $_[ 0 ] }, \@values ); print join(" ", @top); @values = shuffle( 'A' .. 'Z' ); @top = topNbsAB_p( $N, sub { $_[ 0 ] cmp $_[ 1 ] }, \@values ); print join(" ", @top); @top = topNbsAB_p( $N, sub { $_[ 1 ] cmp $_[ 0 ] }, \@values ); print join(" ", @top); sub topNbsAB_p { my( $n, $CMP, $data ) = @_; return unless $n > 0; my @ST; my $selected = 0; for my $val ( @$data ) { $ST[ $selected++ ] = $val if $selected < $n; next if( $CMP->( $val, $ST[ $selected-1 ] ) < 0 ); my $left = 0; + my $right = ($selected >1) ? $selected -1 : $selected; while( $left < $right ) { + my $middle = ( $left + $right ) >> 1; + if( $CMP->( $val, $ST[ $middle ] ) <= 0 ) { $left = $middle + 1; + } else { $right = $middle; + } } for( my $k = $selected; $k >= $left; $k-- ) { $ST[ $k ] = $ST[ + $k - 1 ] }; $ST[ $left ] = $val; + } return @ST[ 0 .. $selected-1 ]; } __END__ P:\test>junk3 -MAX=10 -N=0 P:\test>junk3 -MAX=10 -N=1 10 1 Z A P:\test>junk3 -MAX=10 -N=2 10 9 1 2 Z Y A B P:\test>junk3 -MAX=10 -N=10 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 Z Y X W V U T S R Q A B C D E F G H I J P:\test>junk3 -MAX=10 -N=11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 Z Y X W V U T S R Q P A B C D E F G H I J K P:\test>junk3 -MAX=10 -N=26 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 Z Y X W V U T S R Q P O N M L K J I H G F E D C B A A B C D E F G H I J K L M N O P Q R S T U V W X Y Z P:\test>junk3 -MAX=10 -N=27 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 Z Y X W V U T S R Q P O N M L K J I H G F E D C B A A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

    A similar mechanism should work for the heap sort version.


    Examine what is said, not who speaks.
    Silence betokens consent.
    Love the truth but pardon error.
      I still get a weird result when I reverse the $a and $b in the comparator. Now it's:
      -1 0 0 0 7
      Oh well, I think I'll keep the comparison caller in a separate subroutine for simplicity and maintainability anyway.

      A similar mechanism should work for the heap sort version.

      Actually, I don't need to do the same thing for the heap. All I need to do is call makeHeap on the first N items. That's O(N); building up the heap item-by-item is O(NlogN).

      I'm thinking that this is good enough to go ahead with. If I implement the four simple cases (maxN, minN, maxNstr, minNstr) as you suggested, and also a user-supplied comparator, that will be fine.

      We've been giving perl's sort an unfair advantage against our homemade comparator. I tried a benchmark against a call it didn't know how to optimize, { ($a < $b) ? -1 : (($a > $b) ? 1:0) }, and our partial sorts do better in many cases (5 in 1000, 5 in 10,000 etc.).

        I still get a weird result when I reverse the $a and $b in the comparator.

        Sorry, I'm lost as to which version of what you are referring to? My C; your C; my Perl?

        In the C code I posted back two levels, it only worked for {$b<=>$a} because that's all the algorithm in topNbs handled? If you are indicating something else, then I'll need a heavier cluebat :)

        Actually, I don't need to do the same thing for the heap.

        Then why havie I been faffing around fixing the topNbs() algorithm then? :)

        I've only been using the topNbs for testing, because the heap version never worked on my system. Your compiler apparently allows vaiable declarations at places other than the top of the block (ala C++), but my doesn't--or maybe could if I applied the appropriate switches. <pI was more interested in trying to make one algorithm work as best I could re: building the return list and calling the comparator and the topNbs algorithm is simpler to work with than the heap for testing.

        It also beats out the heap for all but the very largest of subsets--greater than 10,000 from memory. It would appear that the saving gained by the heap from moving less elements around is, considerably offset by the complexity of the algorithm. Inserts to the binary chop may move more elements, but they do so in a very tight, optimisable loop. You have to be moving a large number to outweight the cost of the heap algorithm.

        We've been giving perl's sort an unfair advantage against our homemade comparator. I tried a benchmark against a call it didn't know how to optimize, { ($a < $b) ? -1 : (($a > $b) ? 1:0) }, and our partial sorts do better in many cases (5 in 1000, 5 in 10,000 etc.).

        I would say that was artificially giving sort a disadvantage. It does know how to optimise the four basic cases, and in those cases it has the advantage once the subset gets to above ~ 10% of the set. It's only with very small subsets of large lists that we win--if we have to call the comparator and it doesn't.

        That said. I also tested a version of topNbs that didn't call the comparator at all. And again, probably due to my lack of proficiency in XS, in all but a few extreme cases, very small subsets from very large lists, sort n slice wins most times.

        sort has had a lot of clever people, over many years adding to it's optimisation, and I've been doing XS for 4 days!

        I'm currently attempting to work at from the other end, by starting with the code for reduce from List::Util.xs. I've succeeded in getting that to build and function correctly from within the Inline::C framework.

        I intend to first test to see if doing through Inline::C adds any noticable overhead to the performance by comparing my version directly with the XS built version. Assuming that any overhead is minimal, then I'll try and insert the topNbs code into that and see how is performs relative to the other two versions (separate callComp() and inlined-call_sv()).

        If it works better--and it probably will as I can already see several things it does in a simpler, more efficient way than I am doing. I'm still doing the aliasing to $a & $b all wrong. In fact, I'm not aliasing at all and I am stomping on any existing values $a and $b might have in the calling code--ie. Not doing the equivalent of local $a, $b;.

        There is a lot going on in reduce() that I do not understand, but maybe by the time I do, I'll be in a better position?


        Examine what is said, not who speaks.
        Silence betokens consent.
        Love the truth but pardon error.