in reply to profiling XS routines
One thought. Does your current profiling show that you are spending an excessive amount of time in each call to the XS sub? Or is the time accumulated over many, many relatively quick calls?
The point being that to make really effective use of XS routines for speeding things up, each call to the XS routine should do as much as possible. This because there is a significant overhead involved in the transfers into and out of an XS subroutine.
For example, PDL does its things very efficiently, and if you have a million numbers to be processed and pass them to PDL as a single lump, the speed up over a pure Perl equivalent is enormous. But, if your requirements mean that you have to process those numbers as 100,000 sets of 10, then your gains will be far less dramatic--maybe even negative. This because the time saved processing just 10 numbers in C, is outweighted by overhead of the transfer into XS and back.
A second thought. When it comes to profiling C code, calls to system (clock etc.) routines can also have a significant overhead. And if you combine each call to get the time with another to printf the results somewhere, the overhead can completely distort the code-under-test.
If you are using a Intel processor, there is a single instruction rdtsc which gets you a high resolution timestamp counter with very little overhead. If your compiler allows in-line assembler--or has a _rdtsc() intrinsic--then this can be combined with a small piece of C--as a macro--that fetches the counter and writes it to a buffer along with a line number (from __LINE__) very efficiently. That allows you to tag various points in your C code and gather some stats without overly distorting the timing.
I don't have an example to hand. The last time I did this was on my old machine, so the source is archived off on a CD somewhere. But it shouldn't be too hard to recreate it if it would be useful?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: profiling XS routines
by etj (Priest) on May 31, 2022 at 22:43 UTC | |
|
Re^2: profiling XS routines
by afoken (Chancellor) on Aug 29, 2009 at 20:39 UTC | |
by BrowserUk (Patriarch) on Aug 29, 2009 at 23:43 UTC | |
by syphilis (Archbishop) on Aug 30, 2009 at 01:34 UTC | |
by BrowserUk (Patriarch) on Aug 30, 2009 at 02:58 UTC | |
by BrowserUk (Patriarch) on Aug 30, 2009 at 04:40 UTC | |
by syphilis (Archbishop) on Aug 30, 2009 at 05:37 UTC |