in reply to Re^9: Better mousetrap (getting top N values from list X)
in thread Better mousetrap (getting top N values from list X)

I read once that the $a,$b trick was done for performance reasons. I know the optimimum use of sort is for perl to call predefined routines. However, sort achieves reasonable performance even for nonstandard user-defined functions. Here is what I tried, which seg-faulted. I need more in-depth perlguts knowledge to make it work. Probably I should give up and write the individual sorts.
#!/usr/bin/perl use strict; use warnings; use Algorithm::Numerical::Shuffle qw( shuffle ); use Inline C => 'DATA'; my $max = 10; my @values = 1 .. $max; my @values_mixed = shuffle(@values); my @top = topNbs(5, sub { $a <=> $b }, \@values_mixed); print join(" ",sort { $a <=> $b } @top),"\n"; __END__ __C__ int callComp( SV* cmp, GV *first, GV *second, SV *a, SV *b) { int count, rv; dSP; ENTER; SAVETMPS; PUSHMARK(SP); PUTBACK; GvSV(first) = a; GvSV(second) = b; if( ( count = call_sv( cmp, G_SCALAR|G_NOARGS) ) != 1 ) croak( "Bad comparator" ); SPAGAIN; rv = POPi; FREETMPS; LEAVE; } void topNbs( int n, SV *cmp, AV*data) { SV **topN; GV *first, *second; int len = av_len( data ); int i, j, k; int left, right; Inline_Stack_Vars; Newz( 1, topN, n + 1, SV* ); for (i = 0; i < n+1; i++) { topN[i] = newSViv( 0 ); } first = gv_fetchpv("a", TRUE, SVt_PV); second = gv_fetchpv("b", TRUE, SVt_PV); for( i = 0; i <= len; i++ ) { SV *val = av_fetch( data, i, 0 ); int comp = callComp(cmp, first, second, val, topN[ n - 1]); if (comp <= 0) continue; left = 0; right = n - 1; while (left < right) { int middle = (left + right) >> 1; if (callComp(cmp, first, second, val, topN[ n - 1]) <= 0) { left = middle + 1; } else { right = middle; } } if( callComp(cmp, first, second, topN[ left ], val) < 0) { for( k = n; k > left; k-- ) topN[ k ] = topN[ k-1 ]; topN[ left ] = val; } } Inline_Stack_Reset; for( i = 0; i < n; i++ ) Inline_Stack_Push( sv_2mortal( newSVsv( topN[ i ] ) ) ); Safefree( topN ); Inline_Stack_Done; }
Update: I must have been thinking in C++. I have moved the GV* declarations up. It still doesn't do what I want, (now it tells me the '<=>' operations are being applied to undefined values sometimes, then seg faults).

Here is some sample output:

Use of uninitialized value in numeric comparison (<=>) at top2.pl line + 11. Use of uninitialized value in numeric comparison (<=>) at top2.pl line + 11. Use of uninitialized value in numeric comparison (<=>) at top2.pl line + 11. Use of uninitialized value in numeric comparison (<=>) at top2.pl line + 11. Use of uninitialized value in numeric comparison (<=>) at top2.pl line + 11. 0 0 0 0 0 Attempt to free unreferenced scalar.

Replies are listed 'Best First'.
Re^11: Better mousetrap (getting top N values from list X)
by BrowserUk (Patriarch) on Feb 05, 2005 at 12:35 UTC

    This (as posted) works. But--if you change the constants in the perl code (100 & 5) to higher values, it segfaults. I can increase 100 to ~112 before this happens, but 5 only to 6 or 7?

    I have no idea how this would occur--but it is just one of a series of "weirdness" that befell me getting it this far.

    If this is XS--yeesh!

    #! perl -slw use strict; use List::Util qw[ shuffle ]; use Inline C => 'DATA'; $^W = 0; my $max = 100; my @values = 1 .. $max; my @values_mixed = shuffle( @values ); my @top = topN( 5, sub { $_[ 0 ] <=> $_[ 1 ]; }, \@values_mixed ); print "@top"; __END__ __C__ int callComp( SV* cmp, SV* a, SV* b ) { int rv; dSP;ENTER; SAVETMPS; PUSHMARK(SP); XPUSHs( newSViv( SvIVX( a ) ) ); XPUSHs( newSViv( SvIVX( b ) ) ); PUTBACK; call_sv( cmp, G_SCALAR ); SPAGAIN; rv = POPi; FREETMPS;LEAVE; return rv; } void topN( int n, SV* comp, AV*data ) { int *topN; int len = av_len( data ); int i, j, k; Inline_Stack_Vars; Newz( 1, topN, n + 1, SV* ); for( i = 0; i < n+1; i++ ) topN[ i ] = newSViv( 0 ); for( i = 0; i <= len; i++ ) { SV* val = *av_fetch( data, i, 0 ); for( j = 0; j < n; j++ ) { int cmp = callComp( comp, topN[ j ], val ); if( cmp >= 0 ) continue; if( cmp < 0 ) { for( k = n; k > j; k-- ) topN[ k ] = topN[ k-1 ]; topN[ j ] = val; break; } } } Inline_Stack_Reset; for( i = 0; i < n; i++ ) Inline_Stack_Push( sv_2mortal( newSVsv( topN[ i ] ) ) ); Safefree( topN ); Inline_Stack_Done; }

    Examine what is said, not who speaks.
    Silence betokens consent.
    Love the truth but pardon error.
Re^11: Better mousetrap (getting top N values from list X)
by BrowserUk (Patriarch) on Feb 05, 2005 at 00:18 UTC

    Hmmm. Now you understand why I said "called but not used the comparator function" back up there somewhere. I could never get the values I was stacking to show up in the sub either?

    I simplified things by moving all the GV stuff into the callComp() wrapper, which allows me to call the comparator sub without segfaults, but I still cannot access the values either via @_ or $a/$b?

    int callComp( SV* cmp, SV *a, SV *b) { int count, rv; GV *first = gv_fetchpv("a", TRUE, SVt_PV); GV *second = gv_fetchpv("b", TRUE, SVt_PV); dSP; ENTER; SAVETMPS; PUSHMARK(SP); XPUSHs( a ); XPUSHs( b ); PUTBACK; GvSV(first) = a; GvSV(second) = b; if( ( count = call_sv( cmp, G_SCALAR|G_NOARGS) ) != 1 ) croak( "Bad comparator" ); SPAGAIN; rv = POPi; FREETMPS; LEAVE; }

    That said, when I look at what is done in the reduce() function in List::Util.xs, we are only scratching the surface yet. I've read perguts a dozen times over the last two days, but it's one of those documents that you have to experiment with each nuance of every sentance to understand the meaning and implications.

    I was kind of hoping that Inline::C + perlapi would allow me to bypass the requirement for acquiring that understanding--but it seems not.

    I'm not sure that I am interested in becoming an XS programmer. The only help available seems to be the p5p list--and they are all too busy doing, to be interested in teaching to do. Oh well!


    Examine what is said, not who speaks.
    Silence betokens consent.
    Love the truth but pardon error.