JohnRS has asked for the wisdom of the Perl Monks concerning the following question:
Hello monks. I seek your wisdom.
I have observed something odd regarding multiprocessing performance on Windows. When I run the test below, it seems that there is a * hugh * amount of process switching overhead. When I run the same test on a Linux server it runs as expected (almost no overhead). Here are the results.
############################################################ # # ithreads on Centos Linux, 64 bit, 8 CPU's # Perl v5.10.1 built for x86_64-linux-thread-multi # Threads Clock CPU ==> Speed Overhead # ------- ----- ---- ----- -------- # 1 18.2 18.2 ==> 1.0x 0% # 2 9.1 18.2 ==> 2.0x 0% # 3 6.2 18.3 ==> 2.9x 1% # 5 3.7 18.2 ==> 4.9x 0% # 8 2.4 18.4 ==> 7.6x 1% # # ithreads on my Windows 7, 64 bit, 8 CPU's # Perl v5.12.3 built for MSWin32-x86-multi-thread # Threads Clock CPU ==> Speed Overhead # ------- ----- ---- ----- -------- # 1 25.0 25.0 ==> 1.0x 0% # 2 14.6 28.1 ==> 1.7x 12% # 3 12.9 37.0 ==> 1.9x 48% # 5 9.9 47.8 ==> 2.5x 91% # 8 8.2 62.1 ==> 3.0x 148% # ############################################################
Running a single child process establishes a baseline, 1.0x speed at 0% overhead. With Linux, running 5 processes, I see a 4.9x speed improvement with less than 1% overhead. Very good. But with Windows, running 5 processes, I see only a 2.5x speed improvement with about 91% overhead! In other words, the speed improvement was only about half of what it should have been and the CPU time almost doubled. What was the CPU doing this extra 91% of the time?
I realize that the test results aren't very accurate (about 10%). I ran them on live, but mostly idle, machines. The deviations in the Windows results are much more than 10%, however, so I think that they are relevant. Here is the test code.
use strict; use warnings; use threads; use Time::HiRes 'time'; my $nr_children = 1; my @threads; my $start = time; foreach my $i (1 .. $nr_children) { $threads[$i] = threads->create(\&Work, $i); } foreach my $i (1 .. $nr_children) { $threads[$i]->join(); } my $stop = time - $start; printf "\nclock: %.1f sec\n", $stop; my @run = times; printf "user: %.1f sec\n", $run[0]; exit; ##### sub Work { my ($i) = @_; foreach ( 1 .. (20e5/$nr_children) ) { my $acct_nrs = "abc\txyz\tdef\tabc\tghi\tghi"; my @temp = split(m/\t/, $acct_nrs, -1); @temp = ( sort keys %{{ map { $_ => 1 } @temp }} ); my $ans = join(', ', @temp); } print " $i"; return; }
The processes run compute bound and keep all 8 CPU's (when using 8 child processes) at 100% simultaneously, both on Windows and Linux. There is no I/O (except one print at the end), no blocking, no locking, and no shared memory. The processes last long enough that the setup time shouldn't be very important. Thus I'm left thinking that the overhead would be due to process switching by the operating system.
This test uses ithreads. I also ran a similar test using forks and the results in both cases, Linux and Windows, were almost identical to the itread results.
I realize that if the processes were normally blocked this wouldn't be as big an issue. But my job is compute bound. So event loops (POE, Coro, etc) wouldn't help. Not even POE's "Wheel", which uses fork, from what I read.
In summary, my questions are: 1) Is my test valid? 2) Is my conclusion valid? 3) Is there a way to get better multiprocessing performance on Windows?
Thanks, John.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Multiprocessing on Windows (Cannot reproduce!)
by BrowserUk (Patriarch) on May 24, 2012 at 01:12 UTC | |
by JohnRS (Scribe) on May 24, 2012 at 02:21 UTC | |
by JohnRS (Scribe) on May 24, 2012 at 06:30 UTC | |
by BrowserUk (Patriarch) on May 24, 2012 at 18:06 UTC | |
by BrowserUk (Patriarch) on May 25, 2012 at 03:34 UTC | |
by JohnRS (Scribe) on May 25, 2012 at 11:20 UTC |