Perl threads loss of performance with system call

daniel85 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Perl threads loss of performance with system call by dave_the_m (Monsignor) on Jul 10, 2021 at 08:58 UTC
I can't reproduce the issue here (linux, 12 core x86_64; I tried various perls from 5.10.1 to 5.34.0). With sleep 1, I get a total running time of the program of 1m 40s (consistent with each thread in parallel doing 100 1s sleeps), and with sleep 0.0001, I get a total running time of 2.5s, which seems reasonable for all the overhead of setting up and tearing down 50 threads plus the overhead of doing 50000 calls to system(). Dave.	[reply]
Re^2: Perl threads loss of performance with system call by daniel85 (Novice) on Jul 10, 2021 at 09:19 UTC
I was yesterday on the #perl-help channel on irc.perl.org and 2-3 people tried and observed the same behaviour. When you run the first one (sleep 1), all the threads are spawned nearly instantaneously, while with the second (sleep 0.0001), you see considerable delay in the spawning.	[reply]
Re^3: Perl threads loss of performance with system call by bliako (Abbot) on Jul 12, 2021 at 12:41 UTC
and what OS did they run the code in?	[reply]
Re: Perl threads loss of performance with system call by bliako (Abbot) on Jul 12, 2021 at 12:42 UTC
I run above code on fedora and I didn't observe any problem, but then again I don't know what is proper! If you provide a Short, Self-Contained, Correct Example which makes it clear to distinguish between correct and incorrect results I will give it a try. You should be aware that there are quite a few ways of running a system command via Perl. And some of them involve running it via a shell, and some don't. Also a system command involves a fork. An exec does not, it replaces the current process with the system command. And of course there are various higher-level ways of spawning a system command like IPC::Run. Do you see the problem with all of them? or some of them? Have you also tried to run parallel sleep() from within the shell, avoiding perl to see if that's a perl problem? bw, bliako Update: `for long sleep: real 1m40.444s user 0m1.271s sys 0m8.883s for short sleep: real 0m1.901s user 0m1.442s sys 0m6.836s` [download] Update2: use `dmesg` and check log files to see if the OS complains that certain quota on resources have been exceeded and it can't allocate more (perhaps too many open filehandles? a backticks-system will return back the STDOUT of the command). Use `strace -fp PID` to see exactly what's going on if you can seave through the maze of output.). Perhaps relevant: GNU-Parallel is an excellent tool for spawning commands in parallel and it is written in Perl.	[reply] [d/l] [select]
Re: Perl threads loss of performance with system call by karlgoethebier (Abbot) on Jul 12, 2021 at 12:43 UTC
What is the real command that you want to call via qx or system? And did you consider IPC::Run? See Re: Cheapest way for multiple external commands for some hints. For further experiments you may consider this variation of my MCE::Hobo example: Read more... (945 Bytes) Moreover, I am of the opinion that a pure Perl solution would be best. ŤThe Crux of the Biscuit is the Apostropheť	[reply] [d/l]
Re^2: Perl threads loss of performance with system call by daniel85 (Novice) on Jul 13, 2021 at 20:41 UTC
What "a pure perl solution" would be? Cause the program I have to call is a program that performs a very specific analysis, and I cannot just rewrite it in Perl. Also, if you take my code, and replace the backticks with qx or system, the result is always the same.	[reply]
Re^3: Perl threads loss of performance with system call by karlgoethebier (Abbot) on Jul 14, 2021 at 01:21 UTC
I guess you are a little bit resistant to advice. ŤBeratungsresistentť as we say in German. Please provide more details. Else I abandon all hope. Best regards, Karl ŤThe Crux of the Biscuit is the Apostropheť	[reply]
Re^4: Perl threads loss of performance with system call by daniel85 (Novice) on Jul 14, 2021 at 13:07 UTC
Re^5: Perl threads loss of performance with system call by karlgoethebier (Abbot) on Jul 14, 2021 at 20:33 UTC
Some notes below your chosen depth have not been shown here
Re: Perl threads loss of performance with system call by karlgoethebier (Abbot) on Jul 10, 2021 at 08:11 UTC
Q: Is your ŤProblemť really embarrassingly parallel? ŤThe Crux of the Biscuit is the Apostropheť	[reply]
Re^2: Perl threads loss of performance with system call by daniel85 (Novice) on Jul 10, 2021 at 08:25 UTC
Hi karlgoethebier, unfortunately yes. And it runs smoothly with fork(), but I want to possibly handle it with threads.	[reply]
Re^3: Perl threads loss of performance with system call by karlgoethebier (Abbot) on Jul 10, 2021 at 08:47 UTC
ŤI want to possibly handle it with threadsť Probably you should consider to use MCE. It forks but you can say `use threads;` and it works as designed. ŤThe Crux of the Biscuit is the Apostropheť	[reply] [d/l]
Re: Perl threads loss of performance with system call by karlgoethebier (Abbot) on Jul 10, 2021 at 08:58 UTC
Ť within each thread you execute a shell commandť BTW, Im not so sure if this is a good idea. Probably there might be a solution in pure Perl. ŤThe Crux of the Biscuit is the Apostropheť	[reply]
Re: Perl threads loss of performance with system call by karlgoethebier (Abbot) on Jul 11, 2021 at 16:13 UTC
How does this perform on your box? Read more... (704 Bytes) ŤThe Crux of the Biscuit is the Apostropheť	[reply] [d/l]
Re^2: Perl threads loss of performance with system call by Anonymous Monk on Jul 14, 2021 at 05:54 UTC
Please find 2 examples, based on karlgoethebier's demonstration. Example 1: The posix_exit Hobo option is beneficial on Unix platforms. Be sure to close any file handles opened by the worker. The void_context option is helpful to reduce IPC overhead if not wanting result from workers. `#!/usr/bin/env perl use strict; use warnings; use MCE::Hobo; use feature qw(say); use constant AMOUNT => 0.001; my $cores = MCE::Util::get_ncpu(); MCE::Hobo->init( max_workers => $cores, void_context => 1, posix_exit => 1, ); sub consumer_task { my $id = shift; say qq($id ) . MCE::Hobo->pid(); qx( sleep @{[ AMOUNT ]} ); } for my $id ( 1..800 ) { my $hobo = MCE::Hobo->create( \&consumer_task, $id ); } MCE::Hobo->wait_all();` [download] Example 2: Here, we minimize the spawning of Hobo workers and instead use a channel or a queue to send/receive input. #!/usr/bin/env perl use strict; use warnings; use MCE::Hobo; use MCE::Channel; use feature qw(say); use constant AMOUNT => 0.001; my $cores = MCE::Util::get_ncpu(); my $queue = MCE::Channel->new(); MCE::Hobo->init( void_context => 1, posix_exit => 1, ); # Consumers sub consumer_task { while ( my $input = $queue->dequeue() ) { say qq($input->{id} ) . MCE::Hobo->pid(); qx( sleep @{[ AMOUNT ]};) } } MCE::Hobo->create( \&consumer_task ) for 1..$cores; # Producer for ( 1..800 ) { $queue->enqueue({ id => $_ }); } $queue->end(); MCE::Hobo->wait_all(); [download]	[reply] [d/l] [select]
Re^3: Perl threads loss of performance with system call by Anonymous Monk on Jul 14, 2021 at 06:09 UTC
The channel object is bidirectional. Therefore, the worker can send its PID value to the producer. This has the effect of max one item in the channel. The producer writes to STDOUT like the demonstration by karlgoethebier. #!/usr/bin/env perl use strict; use warnings; use MCE::Hobo; use MCE::Channel; use feature qw(say); use constant AMOUNT => 0.001; my $cores = MCE::Util::get_ncpu(); my $queue = MCE::Channel->new(); MCE::Hobo->init( void_context => 1, posix_exit => 1, ); # Consumers sub consumer_task { while ( my $input = $queue->recv() ) { my $id = $input->{id}; $queue->send2( MCE::Hobo->pid() ); qx( sleep @{[ AMOUNT ]};) } } MCE::Hobo->create( \&consumer_task ) for 1..$cores; # Producer for my $id ( 1..800 ) { $queue->send({ id => $id }); my $hobo_pid = $queue->recv2(); say qq($id ) . $hobo_pid; } $queue->end(); MCE::Hobo->wait_all(); [download]	[reply] [d/l]
Re^3: Perl threads loss of performance with system call by karlgoethebier (Abbot) on Jul 14, 2021 at 10:00 UTC
Thanks. This looks good. What I never understood is why to call `MCE::Hobo->wait_all();` in this context. As it seems that all the hobos terminate correctly in this ~~fubar~~ loop without the call. ŤThe Crux of the Biscuit is the Apostropheť	[reply] [d/l]
Re^2: Perl threads loss of performance with system call by daniel85 (Novice) on Jul 11, 2021 at 17:29 UTC
Quite smoothly, but it exits before everything is terminated	[reply]
Re^3: Perl threads loss of performance with system call by karlgoethebier (Abbot) on Jul 11, 2021 at 19:45 UTC
Strange. On my Mac it works as designed. 50 threads and as I have 12 cores 4 x 12 plus 2. It can be seen in the output as well as in the process table. For the moment Im at a loss. ŤThe Crux of the Biscuit is the Apostropheť	[reply]
Re^3: Perl threads loss of performance with system call by Anonymous Monk on Jul 14, 2021 at 04:54 UTC
Uncommenting the wait_all line is necessary to ensure workers have completed and exited. `MCE::Hobo->wait_all();` [download]	[reply] [d/l]
Re^4: Perl threads loss of performance with system call by karlgoethebier (Abbot) on Jul 14, 2021 at 21:46 UTC
Re: Perl threads loss of performance with system call by bliako (Abbot) on Jul 13, 2021 at 23:19 UTC
I have observed that the terms "short" and "long" sleep-times, as you put it, depend on the number of threads (=time to fill the @pool=when to join). For example in my system, a sleep of 0.7s will display delays but only when the number of threads is greater than 300. While a sleep of 1s does show these problems but for 500+ threads. This code `my @pool = map { threads->create(\&mysub) } 1 .. 50;` creates a thread, runs it and then adds it to the @pool and repeats. The important thing to notice is that the thread starts execution immediately while others are still being created. And when thread execution involves `system` (=`fork+exec+waitpid`) which has to exit very shortly, somewhere there is a race, given that threads and fork involve replicate program state. Now this is a rough sketch and probably silly, I am very far from being an expert on Perl gastroenterology. I have a possible workaround, which is lame in that it will probably take longer to finish ... Just make the created threads to wait for the pool to fill up and then do the actual work (system command). That can be achieved with a `shared` flag like below: use threads; use threads::shared; use strict; use Time::HiRes qw/clock_gettime CLOCK_REALTIME usleep/; my $flag:shared = 1; print "starting at ".clock_gettime(CLOCK_REALTIME)."\n"; my @pool; for(1..50){ push @pool, threads->create(\&mysub); } print "giving the flag at ".clock_gettime(CLOCK_REALTIME)."\n"; $flag = 0; # all threads go! $_->join for (@pool); sub mysub { my $tid = threads->tid; while( $flag ){ print "thread $tid created and waiting for the flag at ".clock_get +time(CLOCK_REALTIME)."\n"; usleep(1000); } #my $x = 0.0001 + rand(0.0015); my $x = 0.001; #my $x = rand() >= 0.5 ? 0.001 : 1; print "thread $tid is working with sleep=$x, at ".clock_gettime(CL +OCK_REALTIME)."\n"; my $id; for my $i (1..100){ $id = "i=$i, tid=$tid"; # print ` ` echo -n "start($id/\$\$) on cpu "\`cat /proc/\$\$/stat \| cut - +d' ' -f39\`": "; date +%s.%N sleep $x echo -n " stop($id/\$\$) on cpu "\`cat /proc/\$\$/stat \| cut - +d' ' -f39\`": "; date +%s.%N `; } print "ended " . threads->tid . " at ".clock_gettime(CLOCK_REALTIM +E)."\n"; } __END__ [download] An explanation of what's going on should probably involve the fact that `system` involves `fork`, `exec` and `waitpid` (see Learning Perl, ch 14). And that both threads and fork replicate the program state. And choroba's citing the doc here: Thinking of mixing fork() and threads? Please lie down and wait until the feeling passes. Be aware that the semantics of fork() vary between platforms. For example, some Unix systems copy all the current threads into the child process, while others only copy the thread that called fork(). You have been warned! bw, bliako	[reply] [d/l] [select]
Re^2: Perl threads loss of performance with system call by daniel85 (Novice) on Jul 14, 2021 at 12:25 UTC
This is what I actually tried to do at the very beginning. However, if you keep an eye on your "top", you will see that at most 2-3 threads are running concurrently at any time, while most of them are inactive. Indeed, the total execution time is longer with "sleep 0.001" (or echo A), than it is with "sleep 1".	[reply]
Re^3: Perl threads loss of performance with system call by bliako (Abbot) on Jul 14, 2021 at 12:34 UTC
i use `htop`. I did not see what you describe. All cores were active. Still we are talking about very small time intervals. In any event you now have some tools to debug this and confirm any suspicions you may have. For example `taskset` will report on the current cpu (relocations are common though). bw, bliako	[reply] [d/l] [select]
Re: Perl threads loss of performance with system call by perlfan (Parson) on Jul 10, 2021 at 05:33 UTC
> why? Because Perl doesn't have real lightweight, shared memory threads like you post is making me think might think (through perhaps no fault of your own). > no, really, why? You haven't mentioned anything about the computing environment - virtual, bare metal, one CPU, many vCPUs, etc? Can you replicate this via `bash` backgrounding child processes?	[reply] [d/l]
Re^2: Perl threads loss of performance with system call by daniel85 (Novice) on Jul 10, 2021 at 08:26 UTC
Hi perlfan, the computer is a Fedora linux machine with 56 cores.	[reply]
Re^3: Perl threads loss of performance with system call by perlfan (Parson) on Jul 13, 2021 at 05:03 UTC
Trust me, if you want to do real threading, `perl` is the wrong tool. If you're cool with fork, it is. Check out MCE as suggested, or my personal hammer, Parallel::ForkManager. PDL::ParallelCPU looks interesting, but I have not tried it out. That said, if you want to feel "perlish" yet have real SMP threading, checkout out Qore.	[reply] [d/l]
Re: Perl threads loss of performance with system call by karlgoethebier (Abbot) on Jul 13, 2021 at 12:05 UTC
You may consider this instead of your `mysub`: `use IPC::Run qw( run ); use feature qw(say); say alien( q(echo), q(fubar) ); sub alien { my ($command, $arg) = @_; run qq($command $arg); }` [download] As also recommended by bliako. ŤThe Crux of the Biscuit is the Apostropheť	[reply] [d/l] [select]
Re^2: Perl threads loss of performance with system call by daniel85 (Novice) on Jul 13, 2021 at 15:32 UTC
I tried, same behaviour. So, just to make it clear what "I see", I recorded my screen and made 2 gifs. Here they are: http://daniel85.rf.gd/slow.gif <-- with sleep 0.001 http://daniel85.rf.gd/fast.gif <-- with sleep 1 (or 0.5, is the same)	[reply]
Re^3: Perl threads loss of performance with system call by bliako (Abbot) on Jul 13, 2021 at 17:19 UTC
I think I see the same effect on fedora! But that may be due to terminal's sequential stdout! Here is a script which can help you debug this further, it prints start and stop time of shell system command and which cpu the shell was at the beginning and at the end (or at least that's what I believe as I am not into the Linux scheduler at all). I think it is better to rely on these numbers rather than on the frequency you see "`spawned 10`" on the terminal, something very unreliable! You can process these timings and see for yourself if there is delay from creating to sleeping. I don't see it, e.g. the 1st shell-command is run at 1626196220.813910751 which is, time-wise, near `spawned 3 at 1626196220.81309` I also observe that `spawned 7 at 1626196220.82161` and `start(i=1, tid=7/449711) on cpu 0: 1626196220.828573978` are very near as well. use threads; use strict; use Time::HiRes qw/clock_gettime CLOCK_REALTIME/; # use this to control which CPU you assign this script # (and in my system also the shell system command) #use Sys::CpuAffinity; #my $success = Sys::CpuAffinity::setAffinity($$, [1]); print "starting at ".clock_gettime(CLOCK_REALTIME)."\n"; my @pool; for(1..50){ push @pool, threads->create(\&mysub); } $_->join for (@pool); sub mysub { print "spawned " . threads->tid . " at ".clock_gettime(CLOCK_REALT +IME)."\n"; #for(1..100){ select(undef, undef, undef, 0.001); } #my $x = 0.0001 + rand(0.0015); my $x = 0.001; my $id; for my $i (1..100){ $id = "i=$i, tid=".threads->tid; print ` echo -n "start($id/\$\$) on cpu "\`cat /proc/\$\$/stat \| cut - +d' ' -f39\`": "; date +%s.%N sleep $x echo -n " stop($id/\$\$) on cpu "\`cat /proc/\$\$/stat \| cut - +d' ' -f39\`": "; date +%s.%N `; } print "ended " . threads->tid . " at ".clock_gettime(CLOCK_REALTIM +E)."\n"; } __END__ starting at 1626196220.80981 spawned 1 at 1626196220.81077 spawned 2 at 1626196220.81192 spawned 3 at 1626196220.81309 spawned 4 at 1626196220.81466 spawned 5 at 1626196220.81614 spawned 6 at 1626196220.818 start(i=1, tid=1/449647) on cpu 3: 1626196220.813910751 stop(i=1, tid=1/449647) on cpu 3: 1626196220.818011634 start(i=1, tid=2/449649) on cpu 1: 1626196220.815133617 stop(i=1, tid=2/449649) on cpu 3: 1626196220.819228058 start(i=1, tid=3/449654) on cpu 4: 1626196220.816404247 stop(i=1, tid=3/449654) on cpu 2: 1626196220.820722313 spawned 7 at 1626196220.82161 spawned 8 at 1626196220.8235 start(i=1, tid=4/449665) on cpu 6: 1626196220.818088441 stop(i=1, tid=4/449665) on cpu 4: 1626196220.825072117 spawned 9 at 1626196220.82603 spawned 10 at 1626196220.82844 start(i=1, tid=5/449675) on cpu 4: 1626196220.820177902 start(i=1, tid=6/449684) on cpu 4: 1626196220.824213073 ... [download] bw, bliako	[reply] [d/l] [select]
Re^4: Perl threads loss of performance with system call by daniel85 (Novice) on Jul 13, 2021 at 20:38 UTC
Re^3: Perl threads loss of performance with system call by karlgoethebier (Abbot) on Jul 13, 2021 at 17:05 UTC
Thanks. But i don't believe so much in animated gifs but in Benchmark. And in the recording i see your code and not mine. And probably we should better forget this strange loop with sleep and echo and talk about your real command. And BTW, from my sleep manpage: "The sleep command will accept and honor a non-integer number of specified seconds (with a `.' character as a decimal point). This is a non-portable extension, and its use will nearly guarantee that a shell script will not execute properly on another system." Holy Joe! Best regards, Karl ŤThe Crux of the Biscuit is the Apostropheť	[reply]
Re: Perl threads loss of performance with system call by jflopezfernandez (Novice) on Jul 10, 2021 at 17:36 UTC
I wonder if the problem is that when you call sleep;3 with an argument of one second, the processor considers this long enough to switch context and continue with the next threads, which is not the case when the argument is only one ten-thousandth of a second. This is exactly what happens when the amount of time to sleep for is less than the clock resolution. The sleep functions (meaning both the C library and system functions, on both Linux and FreeBSD) all require an integral argument as their sole calling parameter, so I think what's happening is that the argument is being coerced into a zero, the thread is calling `sleep` with an argument of zero, and thus the process is sequentially calling sleep two threads at a time, as that is its upper limit on an Intel processor with hyperthreading. Read more... (1067 Bytes) Most decidedly not a Perl guru	[reply] [d/l] [select]
Re^2: Perl threads loss of performance with system call by choroba (Cardinal) on Jul 10, 2021 at 18:29 UTC
All the shells I have installed (bash, dash, tcsh) support floating point arguments to sleep and they really try to sleep for the specified non-integer time. (See for example lib/sh/ufuncs.c in bash). `map{substr$_->[0],$_->[1]\|\|0,1}[\\|\|{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^ARGV,3]`	[reply] [d/l]
Re^2: Perl threads loss of performance with system call by daniel85 (Novice) on Jul 10, 2021 at 19:11 UTC
Thanks for your reply! The "sleep" was just an example. You can observe the same behaviour with any other command that gets quickly executed (for example, "echo A" will mimic the same behaviour).	[reply]
Re: Perl threads loss of performance with system call by karlgoethebier (Abbot) on Jul 15, 2021 at 06:48 UTC
Ť mixing fork() and threadsť I guess bliako is right. I had always a bad feeling with this idea. But I have another idea: You are on Fedora, right? Why dont you simply fork? You can do this Ťby handť or use MCE or Parallel::ForkManager. You know this. And as we now know your real command it would be also nice to see your real code. Else debugging is really hard and just a guessing game. One more thing: I dont know anything about the lib(s) you use. Do you really need the full Ťframeworkť? And is it possible for you to port something you need from C to Perl? To get the idea you might take a look at List::MoreUtils::PP and compare it with List::MoreUtils. Best regards, Karl ŤThe Crux of the Biscuit is the Apostropheť	[reply]
Re: Perl threads loss of performance with system call by karlgoethebier (Abbot) on Jul 16, 2021 at 06:41 UTC
Dont give up ŤThe Crux of the Biscuit is the Apostropheť	[reply]
Re: Perl threads loss of performance with system call by Anonymous Monk on Jul 10, 2021 at 03:48 UTC
Why are you joining threads? Control the stack size?	[reply]
Re^2: Perl threads loss of performance with system call by ikegami (Patriarch) on Jul 11, 2021 at 05:25 UTC
It waits for them to finish. Seeking work! You can reach me at ikegami@adaelis.com	[reply]
A reply falls below the community's threshold of quality. You may see it by logging in.