Re: Perl threads loss of performance with system call
by dave_the_m (Monsignor) on Jul 10, 2021 at 08:58 UTC
|
I can't reproduce the issue here (linux, 12 core x86_64; I tried various perls from 5.10.1 to 5.34.0).
With sleep 1, I get a total running time of the program of 1m 40s (consistent with each thread in parallel doing 100 1s sleeps), and with sleep 0.0001, I get a total running time of 2.5s, which seems reasonable for all the overhead of setting up and tearing down 50 threads plus the overhead of doing 50000 calls to system().
Dave. | [reply] |
|
I was yesterday on the #perl-help channel on irc.perl.org and 2-3 people tried and observed the same behaviour.
When you run the first one (sleep 1), all the threads are spawned nearly instantaneously, while with the second (sleep 0.0001), you see considerable delay in the spawning.
| [reply] |
|
| [reply] |
Re: Perl threads loss of performance with system call
by bliako (Abbot) on Jul 12, 2021 at 12:42 UTC
|
I run above code on fedora and I didn't observe any problem, but then again I don't know what is proper! If you provide a Short, Self-Contained, Correct Example which makes it clear to distinguish between correct and incorrect results I will give it a try.
You should be aware that there are quite a few ways of running a system command via Perl. And some of them involve running it via a shell, and some don't. Also a system command involves a fork. An exec does not, it replaces the current process with the system command. And of course there are various higher-level ways of spawning a system command like IPC::Run. Do you see the problem with all of them? or some of them? Have you also tried to run parallel sleep() from within the shell, avoiding perl to see if that's a perl problem?
bw, bliako
Update:
for long sleep:
real 1m40.444s
user 0m1.271s
sys 0m8.883s
for short sleep:
real 0m1.901s
user 0m1.442s
sys 0m6.836s
Update2: use dmesg and check log files to see if the OS complains that certain quota on resources have been exceeded and it can't allocate more (perhaps too many open filehandles? a backticks-system will return back the STDOUT of the command). Use strace -fp PID to see exactly what's going on if you can seave through the maze of output.). Perhaps relevant: GNU-Parallel is an *excellent* tool for spawning commands in parallel and it is written in Perl.
| [reply] [d/l] [select] |
Re: Perl threads loss of performance with system call
by karlgoethebier (Abbot) on Jul 12, 2021 at 12:43 UTC
|
| [reply] [d/l] |
|
What "a pure perl solution" would be? Cause the program I have to call is a program that performs a very specific analysis, and I cannot just rewrite it in Perl. Also, if you take my code, and replace the backticks with qx or system, the result is always the same.
| [reply] |
|
| [reply] |
|
|
|
Re: Perl threads loss of performance with system call
by karlgoethebier (Abbot) on Jul 10, 2021 at 08:11 UTC
|
| [reply] |
|
Hi karlgoethebier,
unfortunately yes. And it runs smoothly with fork(), but I want to possibly handle it with threads.
| [reply] |
|
| [reply] [d/l] |
Re: Perl threads loss of performance with system call
by karlgoethebier (Abbot) on Jul 10, 2021 at 08:58 UTC
|
«… within each thread you execute a shell command…»
BTW, I’m not so sure if this is a good idea. Probably there might be a solution in pure Perl.
«The Crux of the Biscuit is the Apostrophe»
| [reply] |
Re: Perl threads loss of performance with system call
by karlgoethebier (Abbot) on Jul 11, 2021 at 16:13 UTC
|
| [reply] [d/l] |
|
Please find 2 examples, based on karlgoethebier's demonstration.
Example 1:
The posix_exit Hobo option is beneficial on Unix platforms. Be sure to close any file handles opened by the worker. The void_context option is helpful to reduce IPC overhead if not wanting result from workers.
#!/usr/bin/env perl
use strict;
use warnings;
use MCE::Hobo;
use feature qw(say);
use constant AMOUNT => 0.001;
my $cores = MCE::Util::get_ncpu();
MCE::Hobo->init(
max_workers => $cores,
void_context => 1,
posix_exit => 1,
);
sub consumer_task {
my $id = shift;
say qq($id ) . MCE::Hobo->pid();
qx( sleep @{[ AMOUNT ]} );
}
for my $id ( 1..800 ) {
my $hobo = MCE::Hobo->create( \&consumer_task, $id );
}
MCE::Hobo->wait_all();
Example 2:
Here, we minimize the spawning of Hobo workers and instead use a channel or a queue to send/receive input.
#!/usr/bin/env perl
use strict;
use warnings;
use MCE::Hobo;
use MCE::Channel;
use feature qw(say);
use constant AMOUNT => 0.001;
my $cores = MCE::Util::get_ncpu();
my $queue = MCE::Channel->new();
MCE::Hobo->init(
void_context => 1,
posix_exit => 1,
);
# Consumers
sub consumer_task {
while ( my $input = $queue->dequeue() ) {
say qq($input->{id} ) . MCE::Hobo->pid();
qx( sleep @{[ AMOUNT ]};)
}
}
MCE::Hobo->create( \&consumer_task ) for 1..$cores;
# Producer
for ( 1..800 ) {
$queue->enqueue({ id => $_ });
}
$queue->end();
MCE::Hobo->wait_all();
| [reply] [d/l] [select] |
|
#!/usr/bin/env perl
use strict;
use warnings;
use MCE::Hobo;
use MCE::Channel;
use feature qw(say);
use constant AMOUNT => 0.001;
my $cores = MCE::Util::get_ncpu();
my $queue = MCE::Channel->new();
MCE::Hobo->init(
void_context => 1,
posix_exit => 1,
);
# Consumers
sub consumer_task {
while ( my $input = $queue->recv() ) {
my $id = $input->{id};
$queue->send2( MCE::Hobo->pid() );
qx( sleep @{[ AMOUNT ]};)
}
}
MCE::Hobo->create( \&consumer_task ) for 1..$cores;
# Producer
for my $id ( 1..800 ) {
$queue->send({ id => $id });
my $hobo_pid = $queue->recv2();
say qq($id ) . $hobo_pid;
}
$queue->end();
MCE::Hobo->wait_all();
| [reply] [d/l] |
|
| [reply] [d/l] |
|
Quite smoothly, but it exits before everything is terminated
| [reply] |
|
| [reply] |
|
MCE::Hobo->wait_all();
| [reply] [d/l] |
|
Re: Perl threads loss of performance with system call
by bliako (Abbot) on Jul 13, 2021 at 23:19 UTC
|
I have observed that the terms "short" and "long" sleep-times, as you put it, depend on the number of threads (=time to fill the @pool=when to join). For example in my system, a sleep of 0.7s will display delays but only when the number of threads is greater than 300. While a sleep of 1s does show these problems but for 500+ threads. This code my @pool = map { threads->create(\&mysub) } 1 .. 50; creates a thread, runs it and then adds it to the @pool and repeats. The important thing to notice is that the thread starts execution immediately while others are still being created. And when thread execution involves system (=fork+exec+waitpid) which has to exit very shortly, somewhere there is a race, given that threads and fork involve replicate program state. Now this is a rough sketch and probably silly, I am very far from being an expert on Perl gastroenterology.
I have a possible workaround, which is lame in that it will probably take longer to finish ... Just make the created threads to wait for the pool to fill up and then do the actual work (system command). That can be achieved with a shared flag like below:
use threads;
use threads::shared;
use strict;
use Time::HiRes qw/clock_gettime CLOCK_REALTIME usleep/;
my $flag:shared = 1;
print "starting at ".clock_gettime(CLOCK_REALTIME)."\n";
my @pool;
for(1..50){
push @pool, threads->create(\&mysub);
}
print "giving the flag at ".clock_gettime(CLOCK_REALTIME)."\n";
$flag = 0; # all threads go!
$_->join for (@pool);
sub mysub {
my $tid = threads->tid;
while( $flag ){
print "thread $tid created and waiting for the flag at ".clock_get
+time(CLOCK_REALTIME)."\n";
usleep(1000);
}
#my $x = 0.0001 + rand(0.0015);
my $x = 0.001;
#my $x = rand() >= 0.5 ? 0.001 : 1;
print "thread $tid is working with sleep=$x, at ".clock_gettime(CL
+OCK_REALTIME)."\n";
my $id;
for my $i (1..100){
$id = "i=$i, tid=$tid";
# print `
`
echo -n "start($id/\$\$) on cpu "\`cat /proc/\$\$/stat | cut -
+d' ' -f39\`": "; date +%s.%N
sleep $x
echo -n " stop($id/\$\$) on cpu "\`cat /proc/\$\$/stat | cut -
+d' ' -f39\`": "; date +%s.%N
`;
}
print "ended " . threads->tid . " at ".clock_gettime(CLOCK_REALTIM
+E)."\n";
}
__END__
An explanation of what's going on should probably involve the fact that system involves fork, exec and waitpid (see Learning Perl, ch 14). And that both threads and fork replicate the program state. And choroba's citing the doc here:
Thinking of mixing fork() and threads? Please lie down and wait until the feeling passes. Be aware that the semantics of fork() vary between platforms. For example, some Unix systems copy all the current threads into the child process, while others only copy the thread that called fork(). You have been warned!
bw, bliako | [reply] [d/l] [select] |
|
This is what I actually tried to do at the very beginning.
However, if you keep an eye on your "top", you will see that at most 2-3 threads are running concurrently at any time, while most of them are inactive. Indeed, the total execution time is longer with "sleep 0.001" (or echo A), than it is with "sleep 1".
| [reply] |
|
| [reply] [d/l] [select] |
Re: Perl threads loss of performance with system call
by perlfan (Parson) on Jul 10, 2021 at 05:33 UTC
|
> why?
Because Perl doesn't have real lightweight, shared memory threads like you post is making me think might think (through perhaps no fault of your own).
> no, really, why?
You haven't mentioned anything about the computing environment - virtual, bare metal, one CPU, many vCPUs, etc? Can you replicate this via bash backgrounding child processes?
| [reply] [d/l] |
|
Hi perlfan,
the computer is a Fedora linux machine with 56 cores.
| [reply] |
|
| [reply] [d/l] |
Re: Perl threads loss of performance with system call
by karlgoethebier (Abbot) on Jul 13, 2021 at 12:05 UTC
|
use IPC::Run qw( run );
use feature qw(say);
say alien( q(echo), q(fubar) );
sub alien {
my ($command, $arg) = @_;
run qq($command $arg);
}
As also recommended by bliako.
«The Crux of the Biscuit is the Apostrophe»
| [reply] [d/l] [select] |
|
| [reply] |
|
I think I see the same effect on fedora! But that may be due to terminal's sequential stdout!
Here is a script which can help you debug this further, it prints start and stop time of shell system command and which cpu the shell was at the beginning and at the end (or at least that's what I believe as I am not into the Linux scheduler at all). I think it is better to rely on these numbers rather than on the frequency you see "spawned 10" on the terminal, something very unreliable! You can process these timings and see for yourself if there is delay from creating to sleeping. I don't see it, e.g. the 1st shell-command is run at 1626196220.813910751 which is, time-wise, near spawned 3 at 1626196220.81309 I also observe that spawned 7 at 1626196220.82161 and start(i=1, tid=7/449711) on cpu 0: 1626196220.828573978 are very near as well.
use threads;
use strict;
use Time::HiRes qw/clock_gettime CLOCK_REALTIME/;
# use this to control which CPU you assign this script
# (and in my system also the shell system command)
#use Sys::CpuAffinity;
#my $success = Sys::CpuAffinity::setAffinity($$, [1]);
print "starting at ".clock_gettime(CLOCK_REALTIME)."\n";
my @pool;
for(1..50){
push @pool, threads->create(\&mysub);
}
$_->join for (@pool);
sub mysub {
print "spawned " . threads->tid . " at ".clock_gettime(CLOCK_REALT
+IME)."\n";
#for(1..100){ select(undef, undef, undef, 0.001); }
#my $x = 0.0001 + rand(0.0015);
my $x = 0.001;
my $id;
for my $i (1..100){
$id = "i=$i, tid=".threads->tid;
print `
echo -n "start($id/\$\$) on cpu "\`cat /proc/\$\$/stat | cut -
+d' ' -f39\`": "; date +%s.%N
sleep $x
echo -n " stop($id/\$\$) on cpu "\`cat /proc/\$\$/stat | cut -
+d' ' -f39\`": "; date +%s.%N
`;
}
print "ended " . threads->tid . " at ".clock_gettime(CLOCK_REALTIM
+E)."\n";
}
__END__
starting at 1626196220.80981
spawned 1 at 1626196220.81077
spawned 2 at 1626196220.81192
spawned 3 at 1626196220.81309
spawned 4 at 1626196220.81466
spawned 5 at 1626196220.81614
spawned 6 at 1626196220.818
start(i=1, tid=1/449647) on cpu 3: 1626196220.813910751
stop(i=1, tid=1/449647) on cpu 3: 1626196220.818011634
start(i=1, tid=2/449649) on cpu 1: 1626196220.815133617
stop(i=1, tid=2/449649) on cpu 3: 1626196220.819228058
start(i=1, tid=3/449654) on cpu 4: 1626196220.816404247
stop(i=1, tid=3/449654) on cpu 2: 1626196220.820722313
spawned 7 at 1626196220.82161
spawned 8 at 1626196220.8235
start(i=1, tid=4/449665) on cpu 6: 1626196220.818088441
stop(i=1, tid=4/449665) on cpu 4: 1626196220.825072117
spawned 9 at 1626196220.82603
spawned 10 at 1626196220.82844
start(i=1, tid=5/449675) on cpu 4: 1626196220.820177902
start(i=1, tid=6/449684) on cpu 4: 1626196220.824213073
...
bw, bliako | [reply] [d/l] [select] |
|
|
Thanks. But i don't believe so much in animated gifs but in Benchmark. And in the recording i see your code and not mine.
And probably we should better forget this strange loop with sleep and echo and talk about your real command.
And BTW, from my sleep manpage:
"The sleep command will accept and honor a non-integer number of specified
seconds (with a `.' character as a decimal point). This is a non-portable extension, and its use will nearly guarantee that a shell script will
not execute properly on another system."
Holy Joe!
Best regards, Karl
«The Crux of the Biscuit is the Apostrophe»
| [reply] |
Re: Perl threads loss of performance with system call
by jflopezfernandez (Novice) on Jul 10, 2021 at 17:36 UTC
|
I wonder if the problem is that when you call sleep;3 with an argument of one second, the processor considers this long enough to switch context and continue with the next threads, which is not the case when the argument is only one ten-thousandth of a second. This is exactly what happens when the amount of time to sleep for is less than the clock resolution.
The sleep functions (meaning both the C library and system functions, on both Linux and FreeBSD) all require an integral argument as their sole calling parameter, so I think what's happening is that the argument is being coerced into a zero, the thread is calling
sleep with an argument of zero, and thus the process is sequentially calling sleep two threads at a time, as that is its upper limit on an Intel processor with hyperthreading.
Most decidedly not a Perl guru
| [reply] [d/l] [select] |
|
All the shells I have installed (bash, dash, tcsh) support floating point arguments to sleep and they really try to sleep for the specified non-integer time. (See for example lib/sh/ufuncs.c in bash).
map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
| [reply] [d/l] |
|
Thanks for your reply! The "sleep" was just an example. You can observe the same behaviour with any other command that gets quickly executed (for example, "echo A" will mimic the same behaviour).
| [reply] |
Re: Perl threads loss of performance with system call
by karlgoethebier (Abbot) on Jul 15, 2021 at 06:48 UTC
|
«… mixing fork() and threads…»
I guess bliako is right. I had always a bad feeling with this idea. But I have another idea: You are on Fedora, right? Why don’t you simply fork? You can do this «by hand» or use MCE or Parallel::ForkManager. You know this. And as we now know your real command it would be also nice to see your real code. Else debugging is really hard and just a guessing game. One more thing: I don’t know anything about the lib(s) you use. Do you really need the full «framework»? And is it possible for you to port something you need from C to Perl? To get the idea you might take a look at List::MoreUtils::PP and compare it with List::MoreUtils. Best regards, Karl
«The Crux of the Biscuit is the Apostrophe»
| [reply] |
Re: Perl threads loss of performance with system call
by karlgoethebier (Abbot) on Jul 16, 2021 at 06:41 UTC
|
| [reply] |
Re: Perl threads loss of performance with system call
by Anonymous Monk on Jul 10, 2021 at 03:48 UTC
|
Why are you joining threads? Control the stack size? | [reply] |
|
| [reply] |
A reply falls below the community's threshold of quality. You may see it by logging in. |