First, thank you for arguing with numbers. It is a rare event and most welcomed.
But --- you knew that was coming right -- your benchmark:
Let's say the total runtime was the upper of your vague estimate. 8 seconds.
Which means:
By any body's standards, a whole 5 billionths of a second difference is hardly "huge". (Which was your assertion).
And if the body of the loop did anything useful -- like call one or two of the huge macros or long twisty functions that are the reason for having the context within the sub in the first place --then those 5 nanoseconds would just disappear into the noise.
Which, as you point out, entirely swamps the call to TLSGetValue(), by bracketing it with (useless*) calls to GetLastError() and SetLastError().
As we discussed before, what Last error are they preserving, that is important enough to be preserved, not important enough to be reported straight away?
And, if there is justification for preserving some system errors whilst ignoring other, why preserve them in OS memory thus requiring every unimportant system call to be bracketed with GLE/SLE? Why not get the error just after the important system call that caused it and put it somewhere local?
That way, you do one GetLastError() call after each (significant) system call that you want to preserve; rather than bracketing every insignificant system call with two other system calls.
My prime suspect for why TLSGetValue() doesn't get inlined, is the fact that it is bracketed by those other two calls. I'd love to see you add a 3rd test to your benchmark that calls TLSGetValue() directly. I'm not saying it will be inlined, but even if it isn't, it would reduce the (already nanoscopic) difference quite considerably.
The reason functions need to have visibility of the context, is because some of the functions they call, require it be passed to them.
This requirement is often hidden by wrapping the functions that need it in macros. You know better than I do how grossly unwieldy many of the wrapper macros get.
There is a common pattern to many of the worst ones, that goes something like this:
#define SOMETHING1 STMT_START { assert( something ); if(some_complex_c +ondition) wrapped_function1( aTHX_, ... ); assert(something_else ) } +STMT_END #define SOMETHING2 STMT_START { assert( something ); if(some_complex_c +ondition) wrapped_function2( aTHX_, ... ); assert(something_else ) } +STMT_END #define SOMETHING3 STMT_START { assert( something ); if(some_complex_c +ondition) wrapped_function3( aTHX_, ... ); assert(something_else ) } +STMT_END int someFunction( aTHX_ ... ) { dATHX; ...; SOMETHING1( ... ); ...; SOMETHING2( ... ); ...; SOMETHING3( ... ); RETURN; }
The logic being (I assume) that by testing the conditions inline, you prevent the call overhead for the cases where the condition(s) fail.
But a simple test shows that it isn't the case:
With x1(), 50% of calls are avoided by an inline conditional test.
With x2(), that test is moved into the body of the function, which returns immediately if the test fails.
Compile & run:
C:\test>cl /Ox calloverhead.c Microsoft (R) C/C++ Optimizing Compiler Version 15.00.21022.08 for x64 Copyright (C) Microsoft Corporation. All rights reserved. calloverhead.c Microsoft (R) Incremental Linker Version 9.00.21022.08 Copyright (C) Microsoft Corporation. All rights reserved. /out:calloverhead.exe calloverhead.obj C:\test>calloverhead 10000000 Inline condition: 60,068,106 Inbody condition: 45,064,458 C:\test>calloverhead 10000000 Inline condition: 60,037,515 Inbody condition: 45,084,879 C:\test>calloverhead 10000000 Inline condition: 60,048,828 Inbody condition: 45,057,681 C:\test>calloverhead 10000000 Inline condition: 60,032,691 Inbody condition: 45,032,724
And if the conditional tests are inside the body of the functions, you no longer need the macro wrappers -- which makes things a lot clearer for the programmer.
And you also don't need access to the context in all the callers of the wrapped functions, so then the called function can obtain the context internally, thus removing it from visibility at the caller's level.
And the code size shrinks because the conditional test appears once inside the function rather than at every call site.
That's a 3 way win, with no downsides.
The point is that you cannot take one single aspect of the overall vision, mock it up into a highly artificial benchmark and draw conclusions. You have to consider the entire picture.
Of course, it is never going to happen, so there is little point in arguing about it; but if you did effect this kind of change throughout the code base; along with all the other stuff we discussed elsewhere; the effects can be significant.
The hope for using LLVM to compile the Perl runtime, is that by re-writing the macro-infested C sources to IR, and combining them with current compilation unit of Perl code that uses it -- also suitably compiled to IR; it can see through both the macros and the disjoint runloop, and find optimisations on a case-by-case basis that cannot be made universally.
That is to say, (by way of example), a piece of code that uses no magic, and only IVs or UVs, may qualify for optimisations that could not be made statically by a C compiler, because -- given the current structure of the pp_* opcode functions -- it could never possibly see them; as it always has to allow for the possibility of magic; and NVs; and PVs; et al.
In reply to Re^11: Use perl type without perl
by BrowserUk
in thread Use perl type without perl
by xiaoyafeng
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |