in reply to Perl Thread Quitting Abnormally

What OS? What version of Perl? What type of "equipment"? How many is "small numbers"? Ditto "large numbers"?

SIGINT(2) is ^C (on some systems), or could result from a kill 2, $pid from somewhere. On some systems it might be the OS itself terminating the process because it has exceeded some statutory limit or other.

In a nutshell, if you want useful help with this, you're going to have ante up with considerably more information.

Chances are, that given sufficient information it will be trivial to fix; but in the mean time you will doubtless get a bunch of "use POE", "use fork" and "threads are broken" advice.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
RIP an inspiration; A true Folk's Guy

Replies are listed 'Best First'.
Re^2: Perl Thread Quitting Abnormally
by Anonymous Monk on Jul 05, 2010 at 16:44 UTC

    Hi

    Active Perl 5.10 on Windows 2003 Server.

    This is perl, v5.10.0 built for MSWin32-x86-multi-thread (with 3 registered patches, see perl -V for more detail) Copyright 1987-2007, Larry Wall Binary build 1002 283697 provided by ActiveState http://www.ActiveState.com Built Jan 10 2008 11:00:53

    Equipment is microwave radio links, the threaded code uses SNMP_Session to upload inventory statistics via SNMPv1.

    Small is ~20 NEs.

    Large is ~2000NEs.

    The number of threads is controlable via config file, problems have been seen at anywhere between 20 and 50 threads, I'm now trying with fewer threads to see where the problems are.

    I'm the only one logged into the system. Other Perl processes are also running, they are not being affected.

    Thanks

    graham

      Thread 20 terminated abnormally: panic: COND_SIGNAL (298) at cm.pl line 487. Terminating on signal SIGINT(2)

      The error breaks down into several parts:

      1. Thread 20 terminated abnormally:

        Fairly obviously means that thread number 20 was started, but it terminated as a result of something other than return or 'running off the end of the sub'.

      2. panic: COND_SIGNAL (298) at cm.pl line 487.

        This is the reason it terminated. Perl (threads.pm) itself terminated it because of an unexpected internal error condition (panic).

        In this case, the code is executing (either explicitly in your code or implicitly through perl internal code):

        #define COND_SIGNAL(c) \ STMT_START { \ if ((c)->waiters > 0 && \ ReleaseSemaphore((c)->sem,1,NULL) == 0) \ Perl_croak_nocontext("panic: COND_SIGNAL (%ld)",GetLastError() +); \ } STMT_END

        The 298 is the system error code returned by GetLastError after the ReleaseSemaphore() call fails.

        It translates to "Too many posts were made to a semaphore." which is further explained as

        There is a limited number of times that an event semaphore can be posted. You tried to post an event semaphore that has already been posted the maximum number of times.

        If you are making extensive use of threads::shared::cond_* calls in your code, that could be the root of the problem. If you want help in debugging that further, you will have to "show us the code".

        If you are not using the cond_* calls in your code, then it could be that you've unearthed a bug in Perl itself. You might try upgrading your versions of Perl to (say) 5.10.1. And/or your version of threads &| threads::shared.

      3. Terminating on signal SIGINT(2)

        Under most circumstances, this will only occur if you (or someone) types ^C. Did you fail to mention that your application is hanging and you get the above error message only when you interrupt it?


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Hi,

        Thanks for the help, will give me something to investigate. For now, simply turning the number of threads down to 5 seems to have helped.

        I do use $threads$threadno->kill('STOP'); in the code to stop threads that go on for too long. I then trap this with $SIG{'STOP'} = sub {$end_ne=1;}; and test for the value of $end_ne in the code at the end of any part that may take a while. Could this be causing the semaphore errors?

        I can't just kill the long running thread a re-create it, I need to simply back out of the current NE in that thread and set it to a state that the handler can pass it a new NE to try. Unfortunately Perl uses memory up (~5MB from memory) every time a thread starts and doesn't release it until the whole program exits. Given the script could run on 5000 NEs, a major network interuption could therefore see nearly 5000 threads created and killed!!

        I'm not typing ^C and noone else is logged in.

        Thanks

        Graham