in reply to Re^3: Using $a and $b from XS
in thread Using $a and $b from XS

Having got this to work reliably, it seems obvious that whilst Inline::C is convenient relative to XS, it has limitations and imposes overheads that make it less than ideal for the writing of this type of extension.

Looking closely at the code for the callComp() function, whilst moving the callback into a separate sub allowed me to get something to work, the overhead it adds is skewing the results. This is especially true of the current implementation of the AB version which is fetching the globals $a and $b for every invocation. That could be fixed, but all my attempts to inline the callback code go nowhere.

There is also a saving to be made by not creating a separate array of SV*s to hold the topNs and then later copying them to the stack. They could be written to and manipulated directly on the stack using EXTEND(SP,n) and ST(n). Or maybe using TARGS? but XS uses dSP; Inline C uses dXARGS; and TARGS requires dTARGS; all of which seem to conflict with one another.

Maybe what is really needed is an Inline XS? The convenience of "Run, tweak. Run, tweak." with the "project management" taken care of, but without the mismatch between the C being converted XS being converted to C and their disparate environmental setups.

I tried lifting the code for reduce() from List::Util as a model of XS done well and trying pursuade Inline C to manage it for me, but again, the environment that Inline C gives you does not work with the XS macros it uses.

I said somwehere else, I feel a little like Pandora. I've resisted getting into XS till now, and I'm beginning to wish I had kept up that resistance. Inline C is seductive in it's simplicity of converting opcode intensive Perl code into fast running C, but it's limitations push you towards XS if you try to do something more than manipulated a few numbers or strings and the return a result. The need to call back to Perl, and Inline Cs apparent limitations in that area push you into another ballpark--and as I said in the other thread--it's a ballgame that I am not sure that I wish to be playing. It has complicated rules, a very sparse playbook and not much by way of a "noobie school"!


Examine what is said, not who speaks.
Silence betokens consent.
Love the truth but pardon error.

Replies are listed 'Best First'.
Re^5: Using $a and $b from XS
by tall_man (Parson) on Feb 07, 2005 at 01:28 UTC
    Maybe some time could be saved by looking up the symbols once and then filling in the aliases each time through the loop. That's what they do in the sort driver code. (I'm not an XS expert either. I'm still hoping an expert will read this thread and give us some other ideas.)
      Maybe some time could be saved by looking up the symbols once and then filling in the aliases each time through the loop

      That could definitely be done. By looking them up once at the top of the main routine and then reverting to your original scheme of passing them into the comparator sub each time (almost as constants) in addition to values to set into them would work, but just pushing and popping them as constants each comparison is going to have a hit.

      So then you look at assigning the values into them inside the main routine, before passing them through. That's good, but it requires cut&paste reuse, complicates the calling of the comparator and makes the code messy.

      That could probably be alleviated by the use of a couple of macros to assign the values to the globals inline to the call to the comparator, but when I tried it, the compiler generated streams of errors.

      It necessitates calling macros within macros, including using some of those macros as lvalues of values returned by other macros. And worse, I don't really understand how some of those macros--which are themselves defined in terms of yet more macros--work! (Does anyone?) Therefore, I am not able to work out why what I was trying to do went bellyup.

      Normally, if I get "macro expansion" problems, I just ask the compiler to output the preprocessor output so I can see what the actual effect is, but just trying to track down which damn source file is being passed to the compiler is the first problem. Trying to call it with the appropriate environment (compiler flags--libraries etc) outside of the MakeMaker nest-of-vipers is impossible.

      So then I tried passing the /P flag via the Inline C options, which works, but as every source file includes every damn header in the whole crumbling edifice, it produces a file of 3/4 of a megabyte and ONE HUNDRED AND THIRTEEN THOUSAND NINE HUNDRED AND FOURTY EIGHT LINES!

      And what lines. Here is what your nice elegant topNbs() function looks like by the time the C compiler comes to convert it to machine code:

      void XS_main_topNbs(register PerlInterpreter *my_perl , CV* cv); void XS_main_topNbs(register PerlInterpreter *my_perl , CV* cv) { register SV **sp = (*Perl_Tstack_sp_ptr(((PerlInterpreter *)pthrea +d_getspecific((*Perl_Gthr_key_ptr(0)))))); register SV **mark = (*Per +l_Tstack_base_ptr(((PerlInterpreter *)pthread_getspecific((*Perl_Gthr +_key_ptr(0)))))) + (*(*Perl_Tmarkstack_ptr_ptr(((PerlInterpreter *)pt +hread_getspecific((*Perl_Gthr_key_ptr(0))))))--); I32 ax = mark - (*P +erl_Tstack_base_ptr(((PerlInterpreter *)pthread_getspecific((*Perl_Gt +hr_key_ptr(0)))))) + 1; I32 items = sp - mark; if (items != 3) Perl_croak(((PerlInterpreter *)pthread_getspecific((*Perl_Gthr_key +_ptr(0)))), "Usage: main::topNbs(n, cmp, data)"); sp -= items; { int n = (int)((((*Perl_Tstack_base_ptr(((PerlInterpreter *)pthr +ead_getspecific((*Perl_Gthr_key_ptr(0))))))[ax + (0)])->sv_flags & 0x +00010000) ? ((XPVIV*) ((*Perl_Tstack_base_ptr(((PerlInterpreter *)pth +read_getspecific((*Perl_Gthr_key_ptr(0))))))[ax + (0)])->sv_any)->xiv +_iv : Perl_sv_2iv(((PerlInterpreter *)pthread_getspecific((*Perl_Gthr +_key_ptr(0)))), (*Perl_Tstack_base_ptr(((PerlInterpreter *)pthread_ge +tspecific((*Perl_Gthr_key_ptr(0))))))[ax + (0)])); SV * cmp = (*Perl_Tstack_base_ptr(((PerlInterpreter *)pthread_g +etspecific((*Perl_Gthr_key_ptr(0))))))[ax + (1)]; AV * data; #line 258 "topN_pl_c7f1.xs" I32* temp; #line 314 "topN_pl_c7f1.c" if ((((*Perl_Tstack_base_ptr(((PerlInterpreter *)pthread_getspecif +ic((*Perl_Gthr_key_ptr(0))))))[ax + (2)])->sv_flags & 0x00080000) && +((((XRV*) ((*Perl_Tstack_base_ptr(((PerlInterpreter *)pthread_getspec +ific((*Perl_Gthr_key_ptr(0))))))[ax + (2)])->sv_any)->xrv_rv)->sv_fla +gs & 0xff)==SVt_PVAV) data = (AV*)((XRV*) ((*Perl_Tstack_base_ptr(((PerlInterpreter +*)pthread_getspecific((*Perl_Gthr_key_ptr(0))))))[ax + (2)])->sv_any) +->xrv_rv; else Perl_croak(((PerlInterpreter *)pthread_getspecific((*Perl_Gthr +_key_ptr(0)))), "data is not an array reference"); #line 260 "topN_pl_c7f1.xs" temp = (*Perl_Tmarkstack_ptr_ptr(((PerlInterpreter *)pthread_getsp +ecific((*Perl_Gthr_key_ptr(0))))))++; topNbs(n, cmp, data); if ((*Perl_Tmarkstack_ptr_ptr(((PerlInterpreter *)pthread_getspeci +fic((*Perl_Gthr_key_ptr(0)))))) != temp) { (*Perl_Tmarkstack_ptr_ptr(((PerlInterpreter *)pthread_getspecifi +c((*Perl_Gthr_key_ptr(0)))))) = temp; do { do { IV tmpXSoff = (0); (*Perl_Tstack_sp_ptr(((PerlInterpre +ter *)pthread_getspecific((*Perl_Gthr_key_ptr(0)))))) = (*Perl_Tstack +_base_ptr(((PerlInterpreter *)pthread_getspecific((*Perl_Gthr_key_ptr +(0)))))) + ax + (tmpXSoff - 1); return; } while (0); } while (0); } return; #line 330 "topN_pl_c7f1.c" (*Perl_Tstack_sp_ptr(((PerlInterpreter *)pthread_getspecific((*Per +l_Gthr_key_ptr(0)))))) = sp; return; } }

      Now whilst the compiler may be able to make sense of it in that form, my eyes refuse to retain focus long enough to read from one end of a single line to the other.

      I did try to reformat that to my usual coding standard, but--even standalone, away from the other 114, 000 lines--it caused my usually reliable reformatting macros to segfault my editor!

      And do you see those oh so helpful line numbers embedded in compiler directives? Apart from the fact that they relate to intermediate files--some of which appear to get deleted--not the original source code--they are also, to the best of my ability to correlate them, WRONG!

      So how does anyone debug this stuff? Symbolic debuggers (do they help)? Embedded break points? Reverse binary osmosis? Pressing their ear to the cpu and listening to the frequency of the hum?

      I'm still hoping an expert will read this thread and give us some other ideas.

      Hmmm. Maybe...


      Examine what is said, not who speaks.
      Silence betokens consent.
      Love the truth but pardon error.

      Here is my final incarnation of the comparator caller "function" plus a version of topNbs() that uses it.

      By re-writing the PUSHMARK() macro, I managed to squeeze the calling of the via-$A$B comparator into another macro and avoid (the little) overhead that imposed.

      I've also done away with the separate SV** topN; and associated memory allocation and freeing by building the return list directly on the stack.

      If your going to adapt your topNheap(), note the order and positioning of the large group of XS macros--they are critical.

      #define MYPUSHMARK(p) \ ( *PL_markstack_ptr = ( ++PL_markstack_ptr == PL_markstack_max ) \ ? ( markstack_grow(), (p) - PL_stack_base ) \ : ( (p) - PL_stack_base ) ) #define CMP( callback, aa, bb ) ( \ ( GvSV( a ) = ( aa ) ), \ ( GvSV( b ) = ( bb ) ), \ ( MYPUSHMARK(SP) ), \ ( call_sv( callback, G_SCALAR|G_NOARGS ) ), \ ( SPAGAIN ), \ ( POPi ) \ ) void topNbsAB( int n, SV *cmp, AV*data ) { int i, k; int left, right; int len = av_len( data ); GV* a = gv_fetchpv( "main::a", TRUE, SVt_PV ); GV* b = gv_fetchpv( "main::b", TRUE, SVt_PV ); dSP; dMARK; dAX; POPMARK; POPs; POPs; POPs; EXTEND( SP, n ); PUSHMARK( SP ); for( i = 0; i < n; i++ ) { PUSHs( newSViv( 0 ) ); } PUTBACK; for( i = 0; i <= len; i++ ) { SV *val = *av_fetch( data, i, 0 ); if( CMP( cmp, val, ST( n-1 ) ) < 0 ) continue; left = 0; right = n; while (left < right) { int middle = ( left + right ) >> 1; if( CMP( cmp, val, ST( middle ) ) <= 0 ) { left = middle + 1; } else { right = middle; } } for( k = n; k >= left; k-- ) ST( k ) = ST( k - 1 ); ST( left ) = val; } XSRETURN( n ); }

      The upshot in my tests is that whilst using $a and $b is quicker than passing via the stack for small numbers of callbacks, as that number grows, the extra cost of lookup of globals (in the comparator sub itself) overtakes the saving of stacking the parameters in the C code.

      Once again, when you move above about 1% N of T, sort starts to win by not having to call the comparator at all.

      I'd be really interested to see what a real XS programmer would make of the above?


      Examine what is said, not who speaks.
      Silence betokens consent.
      Love the truth but pardon error.
        This last example does not work for me. For example, when I try:
        my $max = 10; my @values = 1 .. $max; my @values_mixed = shuffle(@values); my @top = topNbsAB(5, sub { $a <=> $b }, \@values_mixed); print join(" ",sort { $a <=> $b } @top),"\n";

        I get a result like:

        1 1 1 1 9

        I'm also working out how to avoid the extra creation of zero-fill items. Ideally, I would replace them with the first n items of the input in reverse-sorted order. There are two reasons for this:

        1) We cannot yet handle negative numbers.

        2) We should be testing for the case that the input length is less than or equal to n, and in that case returning everything they gave us. Something like:

        // Special case -- too few values provided. Just return them. if (len+1 <= n) { for( i = 0; i < len; i++ ) { SV *val = *av_fetch( data, i, 0 ); PUSHs( newSVsv( val ) ); } PUTBACK; XSRETURN(len+1); }

      Gah! I guess I'm hooked :|

      Pushing things along a little further, I succeeded in coding a macro (that doesn't break), to take care of the assignments to $a & $b and then call the comparator, reducing some of the overhead in the process:

      This appears to be reliable and gives it another step up over sort for smallish lists from biggish ones:

      P:\test>topn -MAX=10000 -N=10 1 trial of sort [10 of 10000] ( 17.186ms total), 17.186/trial 1 trial of topNbsAB [10 of 10000] ( 10.843ms total), 10.843/trial 1 trial of topNbs@ [10 of 10000] ( 17.276ms total), 17.276/trial P:\test>topn -MAX=10000 -N=100 1 trial of sort [100 of 10000] ( 16.568ms total), 16.568/trial 1 trial of topNbsAB [100 of 10000] ( 15.485ms total), 15.485/trial 1 trial of topNbs@ [100 of 10000] ( 22.263ms total), 22.263/trial P:\test>topn -MAX=100000 -N=100 1 trial of sort [100 of 100000] (231.679ms total), 231.67/trial 1 trial of topNbsAB [100 of 100000] (143.079ms total), 143.07/trial 1 trial of topNbs@ [100 of 100000] (187.500ms total), 187.50/trial

      However, as soon as size of N approaches 10% of the main list, sort once again takes a lead:

      P:\test>topn -MAX=100000 -N=1000 [SNIP] 1 trial of sort [1000 of 100000] (231.573ms total), 231.57/tria +l 1 trial of topNbsAB [1000 of 100000] (298.991ms total), 298.99/tria +l 1 trial of topNbs@ [1000 of 100000] (296.875ms total), 296.87/tria +l

      And if the list gets really large and the N very small (which ought to favour the linear search + binary sort over full sort and slice), sort once again wins hands down. More interestingly, topNbs() (via @_) starts to show better again?

      1 trial of sort [10000 of 1000000] ( 3.156s total), 3.156s/tr +ial 1 trial of topNbsAB [10000 of 1000000] ( 13.641s total), 13.641/tr +ial 1 trial of topNbs@ [10000 of 1000000] ( 4.734s total), 4.734s/tr +ial

      I think this is due to the allocation, initialisation of the SV* topN array, followed by the incremental extension of the perl stack through XPUSHs as we copy the topN array onto the stack.


      Examine what is said, not who speaks.
      Silence betokens consent.
      Love the truth but pardon error.