in reply to Re^3: Printing to STDERR causes deadlocks.
in thread Printing to STDERR causes deadlocks.

It still hangs when tracing is enabled, except now it hangs every time (20 attempts). I've posted the modified code below in case I screwed something up? (I assumed the my $temp; was an artifact?)

#! perl -slw use strict; use IO::File; use threads; use threads::shared; BEGIN { $| = 1; our $TRACE ||= 0; print "TRACE=$TRACE"; *CORE::GLOBAL::warn = sub {} unless $TRACE; } sub processData { printf @_; } sub getDataT { my ( $handle, $sharedDataRef, $doneRef ) = @_; while( !$$doneRef ) { warn "t-Locking" . $/; lock $$sharedDataRef; warn 't-Waiting' . $/; cond_wait( $$sharedDataRef ) while $$sharedDataRef; warn 't-Setting' . $/; $$sharedDataRef = $handle->getline; # set $done before handing the data over to the main thread $$doneRef = 1 if $handle->eof; warn 't-Signalling' . $/; cond_signal( $$sharedDataRef ); } return; } my $handle = IO::File->new( $ARGV[ 0 ], 'r' ); my $sharedData :shared; my $done :shared = 0; threads->create( \&getDataT, $handle, \$sharedData, \$done ); while( !$done ) { warn 'm-locking' . $/; lock $sharedData; warn 'm-waiting' . $/; cond_wait $sharedData until $sharedData || $done; warn 'm-Copying' . $/; my $localCopy = $sharedData; warn 'm-undefing' . $/; undef( $sharedData ); warn 'm-signalling' . $/; cond_signal $sharedData; warn 'm-processing' . $/; processData( $localCopy ); }

I'd tried several variations on this theme also without success.

I've also tried using locking directly on $done and $$doneRef, more out of desperation than logic, but it made no difference.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco.
Rule 1 has a caveat! -- Who broke the cabal?

Replies are listed 'Best First'.
Re^5: Printing to STDERR causes deadlocks.
by bmann (Priest) on Apr 27, 2005 at 01:50 UTC
    Okay, I've run it multiple times. It hangs for me too, but randomly - about 1 out of 25 runs - more often if the machine is under heavy load. I moved the warn statements after the lock in both subs and moved the "warn m-processing" above the signal to reduce the time the variable was unlocked. That seemed to help, but how do we quantify it?

    The $done race was not the only one. Looking at the TRACE output, the warn statements don't get executed uniformly. I guess hat's to be expected, since the threads run asynchronously.

    However, when it hangs, I see one of two things: a missed signal or a signal being raised when the other thread isn't waiting.

    Now threads::shared says the following about the second condition:

    If there are no threads blocked in a "cond_wait" on the variable, the signal is discarded. By always locking before signaling, you can (with care), avoid signaling before another thread has entered cond_wait().
    Uh... what does it mean "with care"?

    Two more random notes -

    1. random lines (from the file to copy) get skipped when TRACE=1, again from missed signals.
    2. FWIW, 5.8.4 on Debian runs this correctly consistently.
    My conclusion is that there's too much happening between lock, wait and signal.

      This is the bit I do not understand about the api. In your example below:

      m-processing t-Locking t-Waiting t-Signalling # if a tree signals in a forest, and noone's listening t-Locking t-Waiting <<<< Point A m-locking m-waiting # and again, we both wait...

      At point A, t has locked the shared var and goes into the wait state.

      Then m gets a timeslice, finishes processing and loops back to lock() the var. It gets the lock, but t hasn't yet released it?

      Then again, whilst m is processing, the lock it acquired is still in force, so how the hell did t manage to aquire a lock and move forward to the wait/signal/lock/wait steps?

      If it is possible to use this api to synchronise two threads access to a single var, I'd sure like to see it.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco.
      Rule 1 has a caveat! -- Who broke the cabal?
Re^5: Printing to STDERR causes deadlocks.
by bluto (Curate) on Apr 27, 2005 at 00:12 UTC
    FWIW, this code works fine for me, with and without tracing, on perl 5.8.1 a 2 CPU 2GHz G5 mac with a 20MB text file (696k lines).

    Update: That reminds me. If your perl is ok, suspect pthreads. I've seen some majorly broken Linux versions (not sure what you're using).

      Thanks bluto. 5.8.1 had many other problems with threads. Maybe fixing those problems touched this, or maybe it is just the different implementations on mac/osx versus win. This stuff has been so little exercised that there is probably no way to tell.

      I no longer have 5.8.1, but I've tried various size files on 5.8.4, 5.8.5 and 5.8.6 without success:(

      I also tried using the two arg version of cond_wait(), but I'll admit to not understanding how that's meant to work at the Perl level anyway.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco.
      Rule 1 has a caveat! -- Who broke the cabal?