http://qs1969.pair.com?node_id=1216778


in reply to Re^5: Inline::C on Windows: how to improve performance of compiled code?
in thread Inline::C on Windows: how to improve performance of compiled code?

Why is it necessary to call Perl_get_context() 9 times for EVERY CALL to such a simple function?

I think that is unnecessary and I would expect that those Perl_get_context() calls could be removed by declaring
PRE_HEAD => '#define PERL_NO_GET_CONTEXT 1',
in your scripts "Config" section.

If I don't define PERL_NO_GET_CONTEXT, then for me your script outputs:
Took 0.160126seconds 1000000
With PERL_NO_GET_CONTEXT defined it runs twice as quickly:
Took 0.072088seconds 1000000
(I've ignored the CCFLAGS output that is also produced.)
AIUI, the problem with defining PERL_NO_GET_CONTEXT in Inline::C scripts is that it causes breakage if any of the Inline::C functions call Perl API functions.
But none of the functions in your script call Perl API functions, so it's ok to define PERL_NO_GET_CONTEXT.

Could it be that the hint that vr is looking for is simply to "define PERL_NO_GET_CONTEXT" ?

Cheers,
Rob

Replies are listed 'Best First'.
Re^7: Inline::C on Windows: how to improve performance of compiled code?
by BrowserUk (Patriarch) on Jun 16, 2018 at 17:48 UTC
    (I've ignored the CCFLAGS output that is also produced.)

    Just debug.

    AIUI, the problem with defining PERL_NO_GET_CONTEXT in Inline::C scripts is that it causes breakage if any of the Inline::C functions call Perl API functions. But none of the functions in your script call Perl API functions, so it's ok to define PERL_NO_GET_CONTEXT.

    Indeed. T'is unfortunate that almost every function that does anything useful needs to call at least one perl API.

    It is the case that many, if not all-but-one, of the Perl_get_context() calls get optimised away, but getting your hands on the post-optimised assembly is only possible by using a debugger, and it means relating any bug back to the pre-optimised C is a nightmare.

    Could it be that the hint that vr is looking for is simply to "define PERL_NO_GET_CONTEXT" ?

    Possibly; but getting his hands on the assembler output would be the surest way of finding out. That has to be possible with gcc/mingw right?

    I still think that the chances are that gcc is optimising his c-stub and perl callable wrapper away completely.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
    In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit
      T'is unfortunate that almost every function that does anything useful needs to call at least one perl API

      Indeed ... though it's also often the case that many functions that call the Perl API don't really need to.
      Here's a simplistic example:
      use strict; use warnings; use Inline C => Config => PRE_HEAD => '#define PERL_NO_GET_CONTEXT 1', ; use Inline C => <<'EOC'; /* SV * foo(int x) { return newSViv(x); } */ int foo(int x) { return x; } EOC my $x = foo(-1234);https://perlconference.us/tpc-2018-slc/ print $x;
      Both renditions of foo() do essentially the same thing.
      But the rendition that has been commented out won't work when PERL_NO_GET_CONTEXT is defined, whereas the other rendition will.

      So there are possibilities even with the current Inline::C, depending upon how much time and energy you're prepared to devote in order to avoid the Perl API.
      But mostly, it's not worth the effort.
      (Of course, ideally you wouldn't even have to concern yourself with such matters when using Inline::C - and Ingy has indicated (in the link I provided earlier) that this might all be fixed in Inline::C following the Perl Conference that begins in the next day or so.
      In the meantime, if you want to define PERL_NO_GET_CONTEXT, then I think you're generally going to have to create an XS module.

      Cheers,
      Rob
        Both renditions of foo() do essentially the same thing. But the rendition that has been commented out won't work when PERL_NO_GET_CONTEXT is defined, whereas the other rendition will.

        But all that does is move the mapping from int to SV from explicit to implicit. Ie. moves the mapping from the C function to the IC wrapper code.

        And 95% of the overhead is (already) in the wrapper code.

        Maybe the benefits of PERL_NO_GET_CONTEXT are confined to gcc/mingw, but I have just tried it in two different pieces of code (and previously when it came up also) and it never seems to make a jot of difference.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
        In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit
Re^7: Inline::C on Windows: how to improve performance of compiled code?
by vr (Curate) on Jun 16, 2018 at 23:40 UTC
    Could it be that the hint that vr is looking for is simply to "define PERL_NO_GET_CONTEXT" ?

    "Simply"!? No, it isn't simple :-) Not for me. And yes, with this define, a no-op stub performs equally fast both in Linux and threaded Win32, and time for test in OP is 3.7 sec, while it was ~5 and ~11, respectively. (So, BrowserUk, it looks like this stub wasn't optimized away.) Thank you for link and explanation, now at least I have some idea what's going on. The Hash::Util has this magic incantation as first line of its XS, while Array::RefElem hasn't anywhere, so it explains their different speed, too. My real C code calls SvPV and others, with this define it stops working as explained in link you provided, I'll have to solve this, but, these are details to work out.

      (So, BrowserUk, it looks like this stub wasn't optimized away.)

      Hm. If you look above, you'll see that the 'call' from the XS wrapper to void test( SV *sv ) { ++i; } gets inlined to just 1 instruction:

      67 ; 31 : test(sv); 68 69 inc DWORD PTR i

      However, defining PERL_NO_GET_CONTEXT doesn't change a thing in the generated assembler. Of course, that is pre-optimisation code, so your timings may be a better indicator.

      That said, I think you would be better off looking at ways to try and move some or all of your loop into C, rather than trying to optimise the calls from Perl to C.

      What I mean is, if you are calling from Perl -> C 10e8 times, then your Perl code must consist of one or more loops. Whilst there is obviously some savings to be had by minimising the perl -> C -> perl transitions, there is (probably) a much larger saving to be had by moving the loop into C and avoiding all/or a large number of those transitions.

      As an extreme example, the deBruijn sequence generator I recently ported from Python to Perl takes 1587 seconds to generate the de Bruijn sequence for 8-char substrings from a 10-char alphabet; but when ported to C, that drops to 0.57 seconds ( a 99.96% reduction!):

      C:\test>DeBruijnX -N=8 -ALPHA=0123456789 Took: 1586.944328 secs 100000000 Took: 0.579065 secs 100000000

      And a very large part of that massive saving is avoiding the perl function call overhead of the 16 million recursive function calls involved:

      #! perl -slw use strict; # use Config; print $Config{ ccflags }; use Inline C => Config => BUILD_NOISY => 1; #, CCFLAGS => $Config{ ccf +lags } . "/link /FAs"; use Inline C => <<'END_C', NAME => '_deBruijn', CLEAN_AFTER_BUILD =>0 +; #define PERL_NO_GET_CONTEXT 1 int n, iseq; STRLEN k; char *seq, *a; void dbc( int t, int p ) { int i; if( t > n ) { if( n % p == 0 ) for( i = 1; i <= p; ++i ) seq[ iseq++ ] = a[ i ]; } else { a[ t ] = a[ t - p ]; dbc( t+1, p ); for( i = a[ (t - p) ] + 1; i < k; ++i ) { a[ t ] = i; dbc( t+1, t ); } } } SV *deBruijnC( SV *svAlphabet, SV *len ) { int i; char *alphabet = SvPV( svAlphabet, k ); n = (int)SvIV( len ); iseq = 0; Newxz( seq, (int)pow( (double)k, (double)n), char ); Newxz( a, k * n, char ); dbc( 1, 1 ); for( i = 0; i < iseq ; ++i ) { seq[ i ] = alphabet[ seq[ i ] ]; } return newSVpv( seq, iseq ); } END_C

      Defining PERL_NO_GET_CONTEXT doesn't stop it from running, but it doesn't improve performance one iota.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
      In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit
        Defining PERL_NO_GET_CONTEXT doesn't stop it from running, but it doesn't improve performance one iota

        But you've defined PERL_NO_GET_CONTEXT too late for it to have any effect.

        It needs to be done prior to the inclusion of the 3 perl headers (EXTERN.h, perl.h and XSUB.h).
        Hence, it needs to be done in the Inline::C Config section as
        PRE_HEAD => '#define PERL_NO_GET_CONTEXT 1',
        (The specified string, followed by a newline, will then be inserted in the generated XS file above the inclusion of those 3 headers.)

        UPDATE:

        I think you would be better off looking at ways to try and move some or all of your loop into C, rather than trying to optimise the calls from Perl to C

        I totally agree with that. I think that's where the most significant savings will be found.

        Cheers,
        Rob