http://qs1969.pair.com?node_id=1216799


in reply to Re^6: Inline::C on Windows: how to improve performance of compiled code?
in thread Inline::C on Windows: how to improve performance of compiled code?

Could it be that the hint that vr is looking for is simply to "define PERL_NO_GET_CONTEXT" ?

"Simply"!? No, it isn't simple :-) Not for me. And yes, with this define, a no-op stub performs equally fast both in Linux and threaded Win32, and time for test in OP is 3.7 sec, while it was ~5 and ~11, respectively. (So, BrowserUk, it looks like this stub wasn't optimized away.) Thank you for link and explanation, now at least I have some idea what's going on. The Hash::Util has this magic incantation as first line of its XS, while Array::RefElem hasn't anywhere, so it explains their different speed, too. My real C code calls SvPV and others, with this define it stops working as explained in link you provided, I'll have to solve this, but, these are details to work out.

  • Comment on Re^7: Inline::C on Windows: how to improve performance of compiled code?

Replies are listed 'Best First'.
Re^8: Inline::C on Windows: how to improve performance of compiled code?
by BrowserUk (Patriarch) on Jun 17, 2018 at 04:00 UTC
    (So, BrowserUk, it looks like this stub wasn't optimized away.)

    Hm. If you look above, you'll see that the 'call' from the XS wrapper to void test( SV *sv ) { ++i; } gets inlined to just 1 instruction:

    67 ; 31 : test(sv); 68 69 inc DWORD PTR i

    However, defining PERL_NO_GET_CONTEXT doesn't change a thing in the generated assembler. Of course, that is pre-optimisation code, so your timings may be a better indicator.

    That said, I think you would be better off looking at ways to try and move some or all of your loop into C, rather than trying to optimise the calls from Perl to C.

    What I mean is, if you are calling from Perl -> C 10e8 times, then your Perl code must consist of one or more loops. Whilst there is obviously some savings to be had by minimising the perl -> C -> perl transitions, there is (probably) a much larger saving to be had by moving the loop into C and avoiding all/or a large number of those transitions.

    As an extreme example, the deBruijn sequence generator I recently ported from Python to Perl takes 1587 seconds to generate the de Bruijn sequence for 8-char substrings from a 10-char alphabet; but when ported to C, that drops to 0.57 seconds ( a 99.96% reduction!):

    C:\test>DeBruijnX -N=8 -ALPHA=0123456789 Took: 1586.944328 secs 100000000 Took: 0.579065 secs 100000000

    And a very large part of that massive saving is avoiding the perl function call overhead of the 16 million recursive function calls involved:

    #! perl -slw use strict; # use Config; print $Config{ ccflags }; use Inline C => Config => BUILD_NOISY => 1; #, CCFLAGS => $Config{ ccf +lags } . "/link /FAs"; use Inline C => <<'END_C', NAME => '_deBruijn', CLEAN_AFTER_BUILD =>0 +; #define PERL_NO_GET_CONTEXT 1 int n, iseq; STRLEN k; char *seq, *a; void dbc( int t, int p ) { int i; if( t > n ) { if( n % p == 0 ) for( i = 1; i <= p; ++i ) seq[ iseq++ ] = a[ i ]; } else { a[ t ] = a[ t - p ]; dbc( t+1, p ); for( i = a[ (t - p) ] + 1; i < k; ++i ) { a[ t ] = i; dbc( t+1, t ); } } } SV *deBruijnC( SV *svAlphabet, SV *len ) { int i; char *alphabet = SvPV( svAlphabet, k ); n = (int)SvIV( len ); iseq = 0; Newxz( seq, (int)pow( (double)k, (double)n), char ); Newxz( a, k * n, char ); dbc( 1, 1 ); for( i = 0; i < iseq ; ++i ) { seq[ i ] = alphabet[ seq[ i ] ]; } return newSVpv( seq, iseq ); } END_C

    Defining PERL_NO_GET_CONTEXT doesn't stop it from running, but it doesn't improve performance one iota.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
    In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit
      Defining PERL_NO_GET_CONTEXT doesn't stop it from running, but it doesn't improve performance one iota

      But you've defined PERL_NO_GET_CONTEXT too late for it to have any effect.

      It needs to be done prior to the inclusion of the 3 perl headers (EXTERN.h, perl.h and XSUB.h).
      Hence, it needs to be done in the Inline::C Config section as
      PRE_HEAD => '#define PERL_NO_GET_CONTEXT 1',
      (The specified string, followed by a newline, will then be inserted in the generated XS file above the inclusion of those 3 headers.)

      UPDATE:

      I think you would be better off looking at ways to try and move some or all of your loop into C, rather than trying to optimise the calls from Perl to C

      I totally agree with that. I think that's where the most significant savings will be found.

      Cheers,
      Rob
        PRE_HEAD => '#define PERL_NO_GET_CONTEXT 1',

        Ah yes. I'd completely forgotten about that.

        (A quick search of my test directory shows it was jan2015 the last time I played with this, and I've slept since then :) )


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
        In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit