Re^4: OT How fast a cpu to overwhelm Time::HiRes

But switching processes is not cheap.

That depends on your OS ;-). Linux tries to make process context switches extremely cheap (and as a result you can get away with using processes instead of threads for parallel performance). This recent mail on LKML states that on a 3GHz P4 the 2.6 kernel can do up to 700,000 process context switches per second. That's only if the processes do nothing except switch, the mail goes on to explain that under normal workloads you'd only get about 10,000 switches per second. I just took a quick look around the machines I have at hand, and I found one which reports an average of ~60,000 cs/s over the period of fifteen minutes (via sar -w). Running the lat_ctx benchmark from the lmbench suite on an AMD64 machine gives me a minimum context switch overhead of 0.55 microseconds. Given these figures it seems conceivable to me that a context switch can take place in significantly under a microsecond on an extremely fast processor with large cache. I'd agree, this is definitely not something you'd expect to happen, but it seems possible.

Anyway, code walks as they say, here's a little script which forks off a couple of processes and tries to get the same gettimeofday in different children:

#!/usr/bin/perl
use strict;
use warnings;
use Time::HiRes qw(gettimeofday usleep);

my $parent_time=(gettimeofday)[0]+5;

my $children=10;
my $measurements=5000;
my $pid;
for my $child (1..$children) {
    if ($pid=fork()) {
    } elsif (defined $pid) {
        my ($times,@temp_times);
        #Make all children start measuring as nearly simultaneously as
+ we can
        while(1){
            last if ((gettimeofday)[0]>$parent_time);
            usleep 1;
        }
        # Get time measurements
        for (1..$measurements) {
            @temp_times=gettimeofday();
            $times.=$temp_times[0].sprintf("%06d",$temp_times[1])."\n"
+;
            usleep 2;
        }
        sleep 10;
        PrivoxyWindowOpen(my $record,">","timerecord$child") or die "C
+an't open record file";
        print $record $times;
        close $record or die "Can't close record file";
        exit;
    } else {
        die("Cannot fork");
   }
}

# Wait for children to finish
my $kid;
do {
    $kid = waitpid(-1, 0);
} until $kid > 0;


# Put measurements into a hashtable and end if any duplicates are foun
+d
my %measured;
for my $child (1..$children) {
    PrivoxyWindowOpen(my $record,"<","timerecord$child") or die "Can't
+ open record file";
    while(<$record>) {
        chomp;
        if (exists($measured{$_})) {
            print "Found duplicate: $_, gettimeofday returned the same
+ value in child $child and $measured{$_}\n";
            exit;
        }
        $measured{$_}=$child;
    }
    close $record or die "Can't close record file";
}

# Check for shortest time passed between two measurements
my $difference=42;
my ($t1,$t2);
my $last_i=0;
my $last_t=0;
for my $t (sort keys %measured) {
    next if($last_i == $measured{$t});

    my $cur_diff=$t-$last_t;
    if ($cur_diff<$difference) {
        $difference=$cur_diff;
        ($t1,$t2)=($last_t,$t);
    }
    $last_t=$t;
    $last_i=$measured{$t};
}
print "Found minimum delay of $difference between $t2 - child $measure
+d{$t2} and $t1 - child $measured{$t1}\n";
[download]

On SMP machines this easily finds duplicate measurements, so, no surprise there, calls to gettimeofday can return the same value from different processes on SMP. The smallest time period I was able to achieve on a single-processor machine was 11 microseconds. Strangely enough this was not the fastest CPU I tried it on by far, so I suspect it has something to do with the Linux kernel version (this one is running the Debian 2.6.8 kernel, whereas all others have newer versions).

So, at least from this practical test it appears you are right, processes don't switch quickly enough for Time::HiRes to return the same result.

Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan

Comment on Re^4: OT How fast a cpu to overwhelm Time::HiRes Select or Download Code

Replies are listed 'Best First'.
Re^5: OT How fast a cpu to overwhelm Time::HiRes by BrowserUk (Patriarch) on Dec 01, 2005 at 01:54 UTC
That depends on your OS ;-). Linux tries to make process context switches extremely cheap (and as a result you can get away with using processes instead of threads for parallel performance). This recent mail on LKML states that on a 3GHz P4 the 2.6 kernel can do up to 700,000 process context switches per second. That's only if the processes do nothing except switch, the mail goes on to explain that under normal workloads you'd only get about 10,000 switches per second. I just ran the following script: `#!perl -slw use strict; use threads; sub thread{ Win32::Sleep 0 while 1; } my @threads = map{ threads->create( \&thread ) } 1 .. 100; <STDIN>;` [download] Which sets 100 threads going that do nothing but relinquish the processor in a tight loop. With a single copy of this running, I get sustained measurements of 320,000 context switches/second with occasional peaks of up to 345,000. However, if I set a second copy of the script running concurrently, so that roughly 1 in 2 context switches will be a process swap as well as thread switch, the numbers drop to a sustained average of around 215,000/s which clearly shows the extra cost of switching processes (on my OS:). It would be interesting to see the numbers you get for a linux system. Do you run a threaded Perl? However, you do not have to do very much at all to drop these figures way down. Calling gettimeofday() on each thread slows this to around 90,000/second for one copy of the 100 thread process, with a second copy of the script bringing it down to 80,000 or so, and each subsequent process causing a similar drop. `#!perl -slw use strict; use threads; use Time::HiRes qw[ gettimeofday ]; sub thread{ my( $s, $u ); while( 1 ){ Win32::Sleep 0; ( $s, $u ) = gettimeofday(); } } my @threads = map{ threads->create( \&thread ) } 1 .. 100; <STDIN>;` [download] Of course, do any form of IO, or anything that semaphores (like shared variable accesses) and the number drops like a stone. So, at least from this practical test it appears you are right, processes don't switch quickly enough for Time::HiRes to return the same result. No, but it would appear possible for it to happen from within the same process--under Linux and 5.6.2 at least. See the subthread starting at Re^2: OT How fast a cpu to overwhelm Time::HiRes. I couldn't get anywhere near it on my system, but the need to convert between Win32 APIs and the returns dictated by the gettimeofday() call have a significant impact. Even going direct to the high performance timer from C, I couldn't get any closer than just over 1 microsecond elapsed. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re^6: OT How fast a cpu to overwhelm Time::HiRes by tirwhan (Abbot) on Dec 01, 2005 at 09:45 UTC
Hmm, how do you measure context switches between threads? This is done in the perl process internally, so the OS doesn't know anything about them, or am I missing something? Anyway, if I modify your code to use fork instead of threading I do not get any significant amount of process context switching in `vmstat` either. I assume this is because of the way the Linux scheduler works (it assigns long timeslices to CPU-bound processes and preempts them if there's a higher-priority IO-bound tasks). I'll have to figure out how to make Linux context-switch rapidly from perl. Do you run a threaded Perl? Yes, 5.8.4 i386-linux-thread-multi. I modified my code to use threads instead of processes and got the minimal delay time down to 10 microseconds (from 11 with processes). So it looks like the context switch overhead for Linux Perl processes can be only minimally higher than that of Perl threads (for this benchmark, other workloads are bound to exhibit totally different behaviour). What do you get when you run my code? No, but it would appear possible for it to happen from within the same process--under Linux and 5.6.2 at least. Yes, I can confirm that this is true for 5.8.4 as well. So it seems to be possible to achieve duplicate identical results from Time::HiRes if Subsequent calls are made quickly from one process or Several processes are running on SMP architecture or on hyperthreaded processors? I don't have a non-SMP HT machine to test this with, but I'd guess it is possible there as well Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. -- Brian W. Kernighan	[reply] [d/l]
Re^7: OT How fast a cpu to overwhelm Time::HiRes by BrowserUk (Patriarch) on Dec 01, 2005 at 10:18 UTC
Hmm, how do you measure context switches between threads? I use the system performance monitoring tool, perfmon. I would assume that the context switches logged by sar should reflect the thread switches also. This is done in the perl process internally, so the OS doesn't know anything about them, or am I missing something? No. Perl never does it's own context switching. Unlike say Java, which has User-mode thread, and runs it's own mini-scheduler internally to the process. Perl uses Kernel-mode threads, which are scheduled (naturally enough) by the kernel. With User-mode threads, all the threads of a single process share the timeslots allocated to the process by the OS. With Kernel-mode threads, each thread is a distinct OS-schedulable unit and get a full OS timeslice each time. The code I posted should run on a multi-threaded Perl under linux, except you would have to change `Win32::Sleep 0;` for `yield;`. There is no point in my running your code as fork() does not create real processes under win32, it spawns threads. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re^8: OT How fast a cpu to overwhelm Time::HiRes by tirwhan (Abbot) on Dec 01, 2005 at 14:23 UTC
Re^9: OT How fast a cpu to overwhelm Time::HiRes by BrowserUk (Patriarch) on Dec 01, 2005 at 15:00 UTC
Some notes below your chosen depth have not been shown here