in reply to Re^8: Better mousetrap (getting top N values from list X)
in thread Better mousetrap (getting top N values from list X)

Out of curiousity, was your comparator of the {$a <=> $b} type, or did you pass parameters to it?

I haven't worked out how to set $a and $b yet. This was the C wrapper function I used for calling the comparator.

int callComp( SV* cmp, int a, int b ) { int count, rv; dSP; ENTER; SAVETMPS; PUSHMARK(SP); XPUSHs( sv_2mortal( newSViv( a ) ) ); XPUSHs( sv_2mortal( newSViv( b ) ) ); PUTBACK; if( ( count = call_sv( cmp, G_SCALAR ) ) != 1 ) croak( "Bad comparator" ); SPAGAIN; rv = POPi; FREETMPS; LEAVE; }

That said, I am not sure how much difference it would make to the performance? You get the best performance from sort when you don't actually do anything in the comparator sub, which allows perl to completely skip the setting of parameters and the callback to perl.

But I think that the decision that allows perl to do that is embedded deep within the parser, a very specific special case, not something that could easily be transferred to extensions. I don't think it uses the hints mechanism?

That's what I suggested that when the programmer codes @s = sort{$a<=>$b} @u;, the interpreter effectively reads that as @s = sortNumericAscending @u; and sort{$bcmp$a} is effectively translated to 'sortAlphaDescending'.

Indeed, in the latest versions, @s = reverse sort @u; is also interpreted as 'sortAlphaDescending', but I don't think there is any easy way to apply that type of reinterpretation to extensions.

Actually, I think that 4 named variants would be cleaner both from the point of view of understandability: topN() or Ngreatest() doesn't make a great deal of sense in the case of wanting the N lowest values; and efficiency: avoiding the conditional code to decide which way we are ordering.

To fit in with the existing naming conventions of List::Util, they should probably be called: maxN(), minN(), maxNstr() & minNstr(). I don't really like the "defaulting" of min() and max() to numeric--which is the opposite to sort--or the "ommision" of 'num' from the names, but that's what now exists, and consistancy would be a good thing.


Examine what is said, not who speaks.
Silence betokens consent.
Love the truth but pardon error.

Replies are listed 'Best First'.
Re^10: Better mousetrap (getting top N values from list X)
by tall_man (Parson) on Feb 04, 2005 at 22:36 UTC
    I read once that the $a,$b trick was done for performance reasons. I know the optimimum use of sort is for perl to call predefined routines. However, sort achieves reasonable performance even for nonstandard user-defined functions. Here is what I tried, which seg-faulted. I need more in-depth perlguts knowledge to make it work. Probably I should give up and write the individual sorts.
    #!/usr/bin/perl use strict; use warnings; use Algorithm::Numerical::Shuffle qw( shuffle ); use Inline C => 'DATA'; my $max = 10; my @values = 1 .. $max; my @values_mixed = shuffle(@values); my @top = topNbs(5, sub { $a <=> $b }, \@values_mixed); print join(" ",sort { $a <=> $b } @top),"\n"; __END__ __C__ int callComp( SV* cmp, GV *first, GV *second, SV *a, SV *b) { int count, rv; dSP; ENTER; SAVETMPS; PUSHMARK(SP); PUTBACK; GvSV(first) = a; GvSV(second) = b; if( ( count = call_sv( cmp, G_SCALAR|G_NOARGS) ) != 1 ) croak( "Bad comparator" ); SPAGAIN; rv = POPi; FREETMPS; LEAVE; } void topNbs( int n, SV *cmp, AV*data) { SV **topN; GV *first, *second; int len = av_len( data ); int i, j, k; int left, right; Inline_Stack_Vars; Newz( 1, topN, n + 1, SV* ); for (i = 0; i < n+1; i++) { topN[i] = newSViv( 0 ); } first = gv_fetchpv("a", TRUE, SVt_PV); second = gv_fetchpv("b", TRUE, SVt_PV); for( i = 0; i <= len; i++ ) { SV *val = av_fetch( data, i, 0 ); int comp = callComp(cmp, first, second, val, topN[ n - 1]); if (comp <= 0) continue; left = 0; right = n - 1; while (left < right) { int middle = (left + right) >> 1; if (callComp(cmp, first, second, val, topN[ n - 1]) <= 0) { left = middle + 1; } else { right = middle; } } if( callComp(cmp, first, second, topN[ left ], val) < 0) { for( k = n; k > left; k-- ) topN[ k ] = topN[ k-1 ]; topN[ left ] = val; } } Inline_Stack_Reset; for( i = 0; i < n; i++ ) Inline_Stack_Push( sv_2mortal( newSVsv( topN[ i ] ) ) ); Safefree( topN ); Inline_Stack_Done; }
    Update: I must have been thinking in C++. I have moved the GV* declarations up. It still doesn't do what I want, (now it tells me the '<=>' operations are being applied to undefined values sometimes, then seg faults).

    Here is some sample output:

    Use of uninitialized value in numeric comparison (<=>) at top2.pl line + 11. Use of uninitialized value in numeric comparison (<=>) at top2.pl line + 11. Use of uninitialized value in numeric comparison (<=>) at top2.pl line + 11. Use of uninitialized value in numeric comparison (<=>) at top2.pl line + 11. Use of uninitialized value in numeric comparison (<=>) at top2.pl line + 11. 0 0 0 0 0 Attempt to free unreferenced scalar.

      This (as posted) works. But--if you change the constants in the perl code (100 & 5) to higher values, it segfaults. I can increase 100 to ~112 before this happens, but 5 only to 6 or 7?

      I have no idea how this would occur--but it is just one of a series of "weirdness" that befell me getting it this far.

      If this is XS--yeesh!

      #! perl -slw use strict; use List::Util qw[ shuffle ]; use Inline C => 'DATA'; $^W = 0; my $max = 100; my @values = 1 .. $max; my @values_mixed = shuffle( @values ); my @top = topN( 5, sub { $_[ 0 ] <=> $_[ 1 ]; }, \@values_mixed ); print "@top"; __END__ __C__ int callComp( SV* cmp, SV* a, SV* b ) { int rv; dSP;ENTER; SAVETMPS; PUSHMARK(SP); XPUSHs( newSViv( SvIVX( a ) ) ); XPUSHs( newSViv( SvIVX( b ) ) ); PUTBACK; call_sv( cmp, G_SCALAR ); SPAGAIN; rv = POPi; FREETMPS;LEAVE; return rv; } void topN( int n, SV* comp, AV*data ) { int *topN; int len = av_len( data ); int i, j, k; Inline_Stack_Vars; Newz( 1, topN, n + 1, SV* ); for( i = 0; i < n+1; i++ ) topN[ i ] = newSViv( 0 ); for( i = 0; i <= len; i++ ) { SV* val = *av_fetch( data, i, 0 ); for( j = 0; j < n; j++ ) { int cmp = callComp( comp, topN[ j ], val ); if( cmp >= 0 ) continue; if( cmp < 0 ) { for( k = n; k > j; k-- ) topN[ k ] = topN[ k-1 ]; topN[ j ] = val; break; } } } Inline_Stack_Reset; for( i = 0; i < n; i++ ) Inline_Stack_Push( sv_2mortal( newSVsv( topN[ i ] ) ) ); Safefree( topN ); Inline_Stack_Done; }

      Examine what is said, not who speaks.
      Silence betokens consent.
      Love the truth but pardon error.

      Hmmm. Now you understand why I said "called but not used the comparator function" back up there somewhere. I could never get the values I was stacking to show up in the sub either?

      I simplified things by moving all the GV stuff into the callComp() wrapper, which allows me to call the comparator sub without segfaults, but I still cannot access the values either via @_ or $a/$b?

      int callComp( SV* cmp, SV *a, SV *b) { int count, rv; GV *first = gv_fetchpv("a", TRUE, SVt_PV); GV *second = gv_fetchpv("b", TRUE, SVt_PV); dSP; ENTER; SAVETMPS; PUSHMARK(SP); XPUSHs( a ); XPUSHs( b ); PUTBACK; GvSV(first) = a; GvSV(second) = b; if( ( count = call_sv( cmp, G_SCALAR|G_NOARGS) ) != 1 ) croak( "Bad comparator" ); SPAGAIN; rv = POPi; FREETMPS; LEAVE; }

      That said, when I look at what is done in the reduce() function in List::Util.xs, we are only scratching the surface yet. I've read perguts a dozen times over the last two days, but it's one of those documents that you have to experiment with each nuance of every sentance to understand the meaning and implications.

      I was kind of hoping that Inline::C + perlapi would allow me to bypass the requirement for acquiring that understanding--but it seems not.

      I'm not sure that I am interested in becoming an XS programmer. The only help available seems to be the p5p list--and they are all too busy doing, to be interested in teaching to do. Oh well!


      Examine what is said, not who speaks.
      Silence betokens consent.
      Love the truth but pardon error.