Re: parallel process on remote machines,read results and hanle timeout of those process

In the script I have, it sets all the filehandels to non_blocking and use 'sysread' to read the output. Why use non-blocking? What about use while (<FH>) {push @results,$_;}. What is the difference?

If you go the blocking route, and the first blocking handle you attempt to read from fails to respond, you'll block forever.
Even if it eventually responds, you wasted a lot of time waiting for that machine when you could have been reading the responses from other machines that respond more quickly.
By going the non-blocking route, you will get the data from whichever machine responds quickest, as soon as it is available, and thus minimise the overall time required to gather all the data.
The downside is that you have to read, in set & small chunks, and reassemble the output yourself.
How to do the timeout for the filehandle?

Basically, you only need one timeout.
With non-blocking handles, you can fire off the commands to all the machines without waiting for any of them. You then start your timer (record the current time).
Then each time the select loop fires, because data is available, you read that data and add it to the buffer for the appropriate handle. Then you check how long the loop has been running and if it has exceeded your timeout, quit the select loop and close all your handles.
This means that the first machine you send the command to will have had very slightly longer to respond than the last, but as you didn't wait for the responses until after recording your start time, the difference will be minimal; and will actually mean that you gave most of the machines a few milliseconds longer than required. This should not be a problem.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

Comment on Re: parallel process on remote machines,read results and hanle timeout of those process Download Code

Replies are listed 'Best First'.
Re^2: parallel process on remote machines,read results and hanle timeout of those process by x12345 (Novice) on Oct 31, 2014 at 10:50 UTC
Thanks for your explaination. It helps me understannd better. That is why in the script, it reads 1024 bytes each time in a loop for each filehandel. For non-blocking filehandel, it has to be done by this kind of chunk-reading. And using "while (<FH>)", is more for blocking filehandle: I mean, if "Open" not failed, for sure, you can read all lines from the filehandle. My third question is about " EAGAIN() and retry'. I didn't undersntand this part of the code: $hl->{$_}->{retry} = 0; $hl->{$_}->{retries} = 0; my $start = time; my $blocksize = 1024; while (scalar keys %hltodo) { machine: for (keys %hltodo) { my $out = $hl->{$_}->{chld_out}; # begin to read my $bytes_read = -1; while ($bytes_read) { my $buf; my $bytes_read = sysread($out, $buf, $blocksize); if (defined($bytes_read)) { if ($bytes_read == 0) { # eof close($out); last; } else { $hl->{$_}->{data}.= $buf; } } else { if ($! == EAGAIN()) { # retry $hl->{$_}->{retry}++; $hl->{$_}->{retries}++; if ($hl->{$_}->{retry}) { $hl->{$_}->{retry} = 0; next machine; } usleep 10; } else { last; } } } delete $hl->{$_}->{"chld_out"}; delete $hltodo{$_}; } # kill remaining pids if timeout reached if ($opt{timeout} && time > $start + $opt{timeout}) { print STDERR "Timeout for: ", join (" ", keys %hltodo), " +killing ", join (" ", values %hltodo) ; kill 1, values %hltodo; %hltodo = (); } } [download] When the non-blocking filehandel is blocked dur to what ever the reason,it send$! to EAGAIN, then $hl->{$_}->{retry}++ will be 1, so it goes to " $hl->{$_}->{retry} = 0" and "next machine", ti will never do usleep 10 microsecond? I must miss something for this part?	[reply] [d/l]
Re^3: parallel process on remote machines,read results and hanle timeout of those process by BrowserUk (Patriarch) on Oct 31, 2014 at 12:55 UTC
EAGAIN means that whilst there is something available on the socket, hence select has given you it, that at the exact moment you tried to read it, something in the system or tcp stack was busy, and rather than block, it returns EAGAIN and lets you do something else in the mean time before trying again. I agree with you that the retry logic in your code snippet is borked. It will only attempt one retry and will never do the usleep. What you choose to do about that is up to you. Personally, I think I'd probably omit the retry logic completely and just do the microsleep and loop back to the select; but you should probably consult someone with more *nix experience than me if that is your platform. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^4: parallel process on remote machines,read results and hanle timeout of those process by x12345 (Novice) on Oct 31, 2014 at 15:36 UTC
Thanks for your patient. It is really nice of the perl expets here to answer questions. Yah, my platform is Linux. Some questions about nfreeze/ Storable Here is a perl code to bring back the results to the main program. `my $results_serialized = nfreeze \%testresults; print $serialized;` [download] 1. What is the advantage of persistent data structure? all data in the same block of memory,fast speed? It is suitable for what kind of needs? 2. If not use nfreeze, I mean, just use `return \%testresults` [download] It is also working?	[reply] [d/l] [select]
Re^5: parallel process on remote machines,read results and hanle timeout of those process by BrowserUk (Patriarch) on Oct 31, 2014 at 15:56 UTC
Re^6: parallel process on remote machines,read results and hanle timeout of those process by x12345 (Novice) on Nov 04, 2014 at 16:12 UTC