in reply to Why do my threads sometimes die silenty?

Without some clues as to what is going on, it is impossible to diagnose the cause. And therefore difficult to suggest a solution. But I do understand your difficulty in providing those clues.

The single simplest help you could give us, is to post the code of a thread procedure that you know has silently died. If you have more than one, post them all.

One trick I've previously used to track down a similar problem is to wrap the thread procedure in a block eval. So for example, if you have code like this:

sub worker { .... } ... my @workers = map threads->new( \&worker, ... ), 1 .. $WORKERS;

Make the following simple modification:

sub worker { .... } sub proxy { my( $code, @args ) = @_; eval{ $code->( @args ) } or print $@; } ... my @workers = map threads->new( \&proxy, \&worker, ... ), 1 .. $WORKER +S;

This may yield some clues. But posting the code of known to (sometimes) silently die thread subs would likely get better answers.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^2: Why do my threads sometimes die silenty?
by alain_desilets (Beadle) on Sep 21, 2011 at 20:25 UTC
    I think what's happening is this:

    • The master thread starts the slave thread
    • The master thread then waits for messages from the slave to appear in a shared message queue
    • The slave thread dies before it gets to send even a first message
    • As a result, the master thread never joins the slave thread, and this causes the error message to not be printed
    Below is a piece of code that illustrates this:
    use strict; use threads; use threads::shared; my $fct = sub { # eval {require Blah; Blah->import()} or print $@; require Blah; Blah->import(); }; my $thr = threads->new($fct); #$thr->join();
    When executed as is, it produces the following output:
    Perl exited with active threads: 1 running and unjoined 0 finished and unjoined 0 running and detached
    If I remove the comment in front of the last line (the join call), then I get error message:
    Thread 1 terminated abnormally: Can't locate Blah.pm in @INC (@INC con +tains: etc...
    Note also that the eval {} trick doesn't seem to make any difference. In other words, if I remove the comment in from of the eval {} line but leave the join() statement commented out, I still don't get the error message.

    Note also that the silent dying does not happen if, for example, the error is caused by a division by zero instead of a non-existant include file.

    Also interesting is the fact that if I do "use Blah;" instead of "require Blah; Blah->import()", then I see the error message, even if the join call is commented out.

    In case you are wondering why I use a require instead of use, it's because loading the Blah module may end up loading modules which are not thread-safe. Therefore, I want to load the module at run-time, not at compile time (as explained on this node: http://www.perlmonks.org/?node_id=288022).

      With the join commented out, the main thread (and therefore the entire process) terminates before the slave thread ever gets a timeslice. Therefore Perl never attempts to execute the require and so no error message is produced.

      The point here is that the code in the new thread does not get run immediately when you execute the threads->create(). It will not be run until some time later when the OS gets around to allocating it a timeslice. On a single core system this wouldn't happen until the main thread completes its timeslice. On a multi-core system it may get run concurrently with the main thread depending upon the current state of the OS load (including all other processes and system threads). But there is no way to predict how much later that will be.

      If you added a sleep to the main thread (at the same position as the commented out join), then you would see the error because the slave thread would get a timeslice while the main thread was sleeping.

      But why would you comment out the join?


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        If you added a sleep to the main thread (at the same position as the commented out join, then you would see the error because the slave thread would get a timeslice will the main thread was sleeping.
        Interesting... indeed, if I put a sleep at the end of the script, I do indeed see the error.
        But why would you comment out the join?
        I did this to simulate a situation where the main thread gets stuck waiting for some message from the slave thread, which never actually comes because the thread died before sending it. In that situation, the master thread never gets a chance to execute a join(), and so, I don't get to see the error message, even if I do Crtl-C. Or so I thought... Your comment leads me to think that this is not the case. Indeed, here's a piece of code that proves that:
        use strict; use threads; use threads::shared; my $signal = undef; share($signal); my $fct = sub { require Blah; Blah->import(); $signal = 1; }; my $thr = threads->new($fct); while (!$signal) { print "Sleeping and waiting for signal from slave thread\n"; sleep(1); } print "Got message from slave thread\n"; $thr->join();
        When I execute this, I get:
        Sleeping and waiting for signal from slave thread Thread 1 terminated abnormally: Can't locate Blah.pm in @INC (@INC con +tains: etc... Sleeping and waiting for signal from slave thread Sleeping and waiting for signal from slave thread etc...
        In other words, the master does get stuck waiting for a message that will never come, but the error is still printed.

        Like I said earlier, I am having difficulty reproducing the exact circumstances "in vitro". I'll keep working at it and post new developments here. One difference between my app and the above simple example is that I am actually using a Thread::Pool to run the slave.