tall_man has asked for the wisdom of the Perl Monks concerning the following question:

Starting at Re^10: Better mousetrap (getting top N values from list X), BrowserUk and I have been trying to write a new sorting routine as XS extension that could capture the top N (or the least N) items from a list. It would be very nice if the comparison function could avoid the argument-passing overhead and use $a and $b the way sort does. So far, our attempts seg-fault at least some of the time and on some data.

Here is BrowserUk's sort-of-working example. It crashes if the constant 100 is increased to about 112 or if the constant 5 is increased to 6 or 7. Any suggestions?

#! perl -slw use strict; use List::Util qw[ shuffle ]; use Inline C => 'DATA'; $^W = 0; my $max = 100; my @values = 1 .. $max; my @values_mixed = shuffle( @values ); my @top = topN( 5, sub { $_[ 0 ] <=> $_[ 1 ]; }, \@values_mixed ); print "@top"; __END__ __C__ int callComp( SV* cmp, SV* a, SV* b ) { int rv; dSP;ENTER; SAVETMPS; PUSHMARK(SP); XPUSHs( newSViv( SvIVX( a ) ) ); XPUSHs( newSViv( SvIVX( b ) ) ); PUTBACK; call_sv( cmp, G_SCALAR ); SPAGAIN; rv = POPi; FREETMPS;LEAVE; return rv; } void topN( int n, SV* comp, AV*data ) { int *topN; int len = av_len( data ); int i, j, k; Inline_Stack_Vars; Newz( 1, topN, n + 1, SV* ); for( i = 0; i < n+1; i++ ) topN[ i ] = newSViv( 0 ); for( i = 0; i <= len; i++ ) { SV* val = *av_fetch( data, i, 0 ); for( j = 0; j < n; j++ ) { int cmp = callComp( comp, topN[ j ], val ); if( cmp >= 0 ) continue; if( cmp < 0 ) { for( k = n; k > j; k-- ) topN[ k ] = topN[ k-1 ]; topN[ j ] = val; break; } } } Inline_Stack_Reset; for( i = 0; i < n; i++ ) Inline_Stack_Push( sv_2mortal( newSVsv( topN[ i ] ) ) ); Safefree( topN ); Inline_Stack_Done; }

Replies are listed 'Best First'.
Re: Using $a and $b from XS
by BrowserUk (Patriarch) on Feb 05, 2005 at 22:21 UTC

    I fixed the strangely conditional segfaults. I had omitted one of the mysterious macros--or rather only used it once instead of twice.

    int callComp( SV* cmp, SV* a, SV* b ) { dSP; int rv; ENTER; SAVETMPS; PUSHMARK(SP); XPUSHs( newSViv( SvIVX( a ) ) ); XPUSHs( newSViv( SvIVX( b ) ) ); PUTBACK; if( call_sv( cmp, G_SCALAR ) != 1 ) croak( "Bad comparator" ); SPAGAIN; rv = POPi; PUTBACK; // << Was missing and is required! FREETMPS;LEAVE; return rv; }

    But I'd still gratefully receive any suggestions from all those XS aware onlookers?

    I'd also appreciate it anyone who's done callbacks using Inline::C, could show me how the above sub would be coded in terms of the Inline_Stack_* macros?


    Examine what is said, not who speaks.
    Silence betokens consent.
    Love the truth but pardon error.
Re: Using $a and $b from XS
by Mr. Muskrat (Canon) on Feb 05, 2005 at 22:32 UTC

    Totally unrelated to the problem you were having I'm sure but why are you creating one more position in memory than you need?

    Newz( 1, topN, n+1, SV* ); for( i = 0; i < n+1; i++ ) topN[ i ] = newSViv( 0 );

    Update: When I change both instances of n+1 to n, the script continues to provide valid results. I just may have to start concentrating more time on learning XS. This is getting fun.

Re: Using $a and $b from XS
by Mr. Muskrat (Canon) on Feb 05, 2005 at 22:22 UTC

    I increased $max to 112 and it was consistently doing what you described. So I started reducing $max and it appears to be stable at 107. At 108, it displayed the results 14 out of 20 attempts. At 109, out of 20 attempts: 11 displayed, 6 didn't and 3 crashed. At 110, out of 20 attempts: 5 displayed, 10 didn't, 4 crashed with no error and 1 crashed with "Free to wrong pool 222ca8 not 5." At 111, out of 20 attempts: 3 displayed, 4 didn't, 12 crashed with no error and 1 crashed with "Free to wrong pool 222ca8 not 5." At 112, out of 20 attempts: 2 displayed, 3 didn't and 15 crashed with no error.

    Update: I guess I can stop digging through perlguts, perlapi and perlcall now that you fixed it.

Re: Using $a and $b from XS
by tall_man (Parson) on Feb 06, 2005 at 04:11 UTC
    The next trick is to get it to work with $a and $b. The following seems to work, but I'm not sure it isn't corrupting the stack. Any improvements?
    #! perl -slw use strict; use List::Util qw[ shuffle ]; use Inline C => 'DATA'; $^W = 0; my $max = 120; my @values = 1 .. $max; my @values_mixed = shuffle( @values ); my @top = topN( 10, sub { $a <=> $b; }, \@values_mixed ); print "@top"; __END__ __C__ int callComp( SV* cmp, SV* a, SV* b ) { dSP; int rv; ENTER; SAVETMPS; GvSV(gv_fetchpv("main::a", TRUE, SVt_PV)) = a; GvSV(gv_fetchpv("main::b", TRUE, SVt_PV)) = b; PUSHMARK(SP); PUTBACK; if( call_sv( cmp, G_SCALAR|G_NOARGS ) != 1 ) croak( "Bad comparator" ); SPAGAIN; rv = POPi; PUTBACK; // << Was missing and is required! FREETMPS;LEAVE; return rv; } void topN( int n, SV* comp, AV*data ) { int *topN; int len = av_len( data ); int i, j, k; Inline_Stack_Vars; Newz( 1, topN, n + 1, SV* ); for( i = 0; i < n+1; i++ ) topN[ i ] = newSViv( 0 ); for( i = 0; i <= len; i++ ) { SV* val = *av_fetch( data, i, 0 ); for( j = 0; j < n; j++ ) { int cmp = callComp( comp, topN[ j ], val ); if( cmp >= 0 ) continue; if( cmp < 0 ) { for( k = n; k > j; k-- ) topN[ k ] = topN[ k-1 ]; topN[ j ] = val; break; } } } Inline_Stack_Reset; for( i = 0; i < n; i++ ) Inline_Stack_Push( sv_2mortal( newSVsv( topN[ i ] ) ) ); Safefree( topN ); Inline_Stack_Done; }
      Any improvements?

      Only one. You can drop the first PUTBACK; as you aren't stacking any parameters. And it is quicker than passing via the stack. That is very noticable when using my linear search and shift. Will be less evident once your binary search works (it may be I broke that?).

      I'm thinking of trying to do the heap version also.

      I had hoped that we could drop the ENTER;SAVETMPS; and FREETMPS; LEAVE;, which I found I could do (ref:perlguts) with the working version of callComp(), when I was using smallish sets, but it breaks both (through memory leaks) with larger sets.

      I also tried to in-line the code from callComp() into the main routine--but the Inline_Stack_* macros and XS ones seem to make that impossible without moving over to using teh XS macros throughout. Which is probably what I will try next.

      What a way to program!:) Get something to work, and then move things around or comment them out and try it to see if it's required/in the right place or not.

      What with that; error messages that don't relate to the source;intermediate files that the error messages do relate to that change both their names and their directory names every time you changethe program; and the incompatibilty between the Inline_Stack_* macros and the XS macros that are required to write a callback that returns something.

      Shame really. If the Inline_Stack macros were a little more complete or compatible, the abilty to avoid perlguts/perlcall would be invaluable, but a denison of the Inline list reckons that things are unlikely to change anytime soon. For a hap'th of tar. :(


      Examine what is said, not who speaks.
      Silence betokens consent.
      Love the truth but pardon error.
        I have adapted the solution for the binary search sort, and I also have one for a heap. Here are the two examples:

        The binary search case:

        The heap case:

        Any pointers as to how to discover the caller's package from XS/Inline::C?

        Update: Never mind. Just dropping the 'main::' part seems to work fine--so long as they don't embed a package statement into the comparator sub.


        Examine what is said, not who speaks.
        Silence betokens consent.
        Love the truth but pardon error.
Re: Using $a and $b from XS
by BrowserUk (Patriarch) on Feb 26, 2005 at 03:47 UTC

    Addendum: Should anyone come along and see this thread and be tempted to follow suit and "optimise" the "callback-from-XS/C-code", Beware, it's hard!.


    Examine what is said, not who speaks.
    Silence betokens consent.
    Love the truth but pardon error.