I agree that yield and sleep won't solve it reliably - actually, I indicated that in my previous node.
The problem is the space between signaling the main thread and setting eof and the space between testing whether eof is true and waiting on the shared variable. One of these actions needs to be atomic - you don't want to wait for more data if eof is true.
How about this - replace getDataT with this, it'll set $done then signal $sharedData is ready, removing the race condition:
sub getDataT {
my ( $handle, $sharedDataRef, $doneRef ) = @_;
my $temp;
while( !$$doneRef ) {
warn "t-Locking" . $/;
lock $$sharedDataRef;
warn 't-Waiting' . $/;
cond_wait( $$sharedDataRef ) while $$sharedDataRef;
warn 't-Setting' . $/;
$$sharedDataRef = $handle->getline;
# set $done before handing the data over to the main thread
$$doneRef = 1 if $handle->eof;
warn 't-Signalling' . $/;
cond_signal( $$sharedDataRef );
}
return;
}
I would expect that to scale gracefully.
Have you looked at what Thread::Semaphore actually does?
Just the docs. Now I have read the source... point taken ;) | [reply] [d/l] |
It still hangs when tracing is enabled, except now it hangs every time (20 attempts). I've posted the modified code below in case I screwed something up? (I assumed the my $temp; was an artifact?)
I'd tried several variations on this theme also without success.
I've also tried using locking directly on $done and $$doneRef, more out of desperation than logic, but it made no difference.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco.
Rule 1 has a caveat! -- Who broke the cabal?
| [reply] [d/l] [select] |
Okay, I've run it multiple times. It hangs for me too, but randomly - about 1 out of 25 runs - more often if the machine is under heavy load. I moved the warn statements after the lock in both subs and moved the "warn m-processing" above the signal to reduce the time the variable was unlocked. That seemed to help, but how do we quantify it?
The $done race was not the only one. Looking at the TRACE output, the warn statements don't get executed uniformly. I guess hat's to be expected, since the threads run asynchronously.
However, when it hangs, I see one of two things: a missed signal or a signal being raised when the other thread isn't waiting.
Now threads::shared says the following about the second condition:
If there are no threads blocked in a "cond_wait" on the variable,
the signal is discarded. By always locking before signaling, you can
(with care), avoid signaling before another thread has entered
cond_wait().
Uh... what does it mean "with care"?
Two more random notes - - random lines (from the file to copy) get skipped when TRACE=1, again from missed signals.
- FWIW, 5.8.4 on Debian runs this correctly consistently.
My conclusion is that there's too much happening between lock, wait and signal.
| [reply] [d/l] [select] |
| [reply] |