in reply to Re^4: OT How fast a cpu to overwhelm Time::HiRes
in thread OT How fast a cpu to overwhelm Time::HiRes

That depends on your OS ;-). Linux tries to make process context switches extremely cheap (and as a result you can get away with using processes instead of threads for parallel performance). This recent mail on LKML states that on a 3GHz P4 the 2.6 kernel can do up to 700,000 process context switches per second. That's only if the processes do nothing except switch, the mail goes on to explain that under normal workloads you'd only get about 10,000 switches per second.

I just ran the following script:

#!perl -slw use strict; use threads; sub thread{ Win32::Sleep 0 while 1; } my @threads = map{ threads->create( \&thread ) } 1 .. 100; <STDIN>;

Which sets 100 threads going that do nothing but relinquish the processor in a tight loop. With a single copy of this running, I get sustained measurements of 320,000 context switches/second with occasional peaks of up to 345,000.

However, if I set a second copy of the script running concurrently, so that roughly 1 in 2 context switches will be a process swap as well as thread switch, the numbers drop to a sustained average of around 215,000/s which clearly shows the extra cost of switching processes (on my OS:). It would be interesting to see the numbers you get for a linux system. Do you run a threaded Perl?

However, you do not have to do very much at all to drop these figures way down. Calling gettimeofday() on each thread slows this to around 90,000/second for one copy of the 100 thread process, with a second copy of the script bringing it down to 80,000 or so, and each subsequent process causing a similar drop.

#!perl -slw use strict; use threads; use Time::HiRes qw[ gettimeofday ]; sub thread{ my( $s, $u ); while( 1 ){ Win32::Sleep 0; ( $s, $u ) = gettimeofday(); } } my @threads = map{ threads->create( \&thread ) } 1 .. 100; <STDIN>;

Of course, do any form of IO, or anything that semaphores (like shared variable accesses) and the number drops like a stone.

So, at least from this practical test it appears you are right, processes don't switch quickly enough for Time::HiRes to return the same result.

No, but it would appear possible for it to happen from within the same process--under Linux and 5.6.2 at least. See the subthread starting at Re^2: OT How fast a cpu to overwhelm Time::HiRes.

I couldn't get anywhere near it on my system, but the need to convert between Win32 APIs and the returns dictated by the gettimeofday() call have a significant impact. Even going direct to the high performance timer from C, I couldn't get any closer than just over 1 microsecond elapsed.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^6: OT How fast a cpu to overwhelm Time::HiRes
by tirwhan (Abbot) on Dec 01, 2005 at 09:45 UTC

    Hmm, how do you measure context switches between threads? This is done in the perl process internally, so the OS doesn't know anything about them, or am I missing something?

    Anyway, if I modify your code to use fork instead of threading I do not get any significant amount of process context switching in vmstat either. I assume this is because of the way the Linux scheduler works (it assigns long timeslices to CPU-bound processes and preempts them if there's a higher-priority IO-bound tasks). I'll have to figure out how to make Linux context-switch rapidly from perl.

    Do you run a threaded Perl?

    Yes, 5.8.4 i386-linux-thread-multi.

    I modified my code to use threads instead of processes and got the minimal delay time down to 10 microseconds (from 11 with processes). So it looks like the context switch overhead for Linux Perl processes can be only minimally higher than that of Perl threads (for this benchmark, other workloads are bound to exhibit totally different behaviour). What do you get when you run my code?

    No, but it would appear possible for it to happen from within the same process--under Linux and 5.6.2 at least.

    Yes, I can confirm that this is true for 5.8.4 as well. So it seems to be possible to achieve duplicate identical results from Time::HiRes if

    • Subsequent calls are made quickly from one process or
    • Several processes are running on SMP architecture or
    • on hyperthreaded processors? I don't have a non-SMP HT machine to test this with, but I'd guess it is possible there as well

    Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan
      Hmm, how do you measure context switches between threads?

      I use the system performance monitoring tool, perfmon. I would assume that the context switches logged by sar should reflect the thread switches also.

      This is done in the perl process internally, so the OS doesn't know anything about them, or am I missing something?

      No. Perl never does it's own context switching. Unlike say Java, which has User-mode thread, and runs it's own mini-scheduler internally to the process. Perl uses Kernel-mode threads, which are scheduled (naturally enough) by the kernel.

      With User-mode threads, all the threads of a single process share the timeslots allocated to the process by the OS. With Kernel-mode threads, each thread is a distinct OS-schedulable unit and get a full OS timeslice each time.

      The code I posted should run on a multi-threaded Perl under linux, except you would have to change Win32::Sleep 0; for yield;. There is no point in my running your code as fork() does not create real processes under win32, it spawns threads.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Perl never does it's own context switching.

        Aah, of course! Thanks (I used to know that, I can now remember discussing it with some Java guys a while ago).

        except you would have to change Win32::Sleep 0; for yield
        Thanks, I don't normally use threads, I tried using sleep and usleep to get the threads to switch. It works with yield, I ran your code on two different systems, a dual-Xeon 2.8GHz and an AMD64 2GHz, here's what I got (average figures, spikes were up to +/- 20%)
        Xeon AMD64 single process, empty loop 420,000/s 360,000/s two processes, empty loop 330,000/s 220,000/s single process, gettimeofday 230,000/s 110,000/s two processes, gettimeofday 180,000/s 87,000/s

        But if I change that code to use fork and processes, and add an explicit call to the sched_yield system call via Inline::C

        #!perl -slw use strict; use threads; use POSIX qw( WNOHANG ); use Time::HiRes qw(gettimeofday); use Inline C => <<'END_OF_CODE'; #include <sched.h> void yield_me() { sched_yield(); } END_OF_CODE my (@time,$a); sub thread{ while (1) { # gettimeofday(); yield_me() } } my @forks = map{ if (my $pid=fork){waitpid(-1,WNOHANG)}elsif($pid==0){ +thread()}else{die "Cannot fork"} } 1 .. 10; <STDIN>;

        I get these results:

        Xeon AMD64 single process, empty loop 865,000/s 550,000/s two processes, empty loop 845,000 525,000/s single process, gettimeofday 345,000 130,000/s two processes, gettimeofday 340,000 125,000/s

        which is quite a bit better.

        I changed my code to use explicit yield with threads and sched_yield with forks as well and got down to 4 microseconds minimum delay for both on the single-processor system with the older kernel(AMD Athlon 2,2GHz). So maybe I should retract my earlier retraction ;-), it would appear that with a system that's faster than mine (or a kernel that's better tuned to this, e.g. an RT kernel) it could be possible to get duplicate calls from forked processes.

        The really interesting thing about these shenanigans is that context switching between processes seems to be a lot faster than between Perl threads under Linux (with 2.6 kernel, I should add that I ran some of this on a 2.4 system as well and the results were not as good). I'll need to do some more benchmarks, I think.


        Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan