in reply to Re: Perl Thread Quitting Abnormally
in thread Perl Thread Quitting Abnormally

Hi

Active Perl 5.10 on Windows 2003 Server.

This is perl, v5.10.0 built for MSWin32-x86-multi-thread (with 3 registered patches, see perl -V for more detail) Copyright 1987-2007, Larry Wall Binary build 1002 283697 provided by ActiveState http://www.ActiveState.com Built Jan 10 2008 11:00:53

Equipment is microwave radio links, the threaded code uses SNMP_Session to upload inventory statistics via SNMPv1.

Small is ~20 NEs.

Large is ~2000NEs.

The number of threads is controlable via config file, problems have been seen at anywhere between 20 and 50 threads, I'm now trying with fewer threads to see where the problems are.

I'm the only one logged into the system. Other Perl processes are also running, they are not being affected.

Thanks

graham

Replies are listed 'Best First'.
Re^3: Perl Thread Quitting Abnormally
by BrowserUk (Patriarch) on Jul 05, 2010 at 22:38 UTC
    Thread 20 terminated abnormally: panic: COND_SIGNAL (298) at cm.pl line 487. Terminating on signal SIGINT(2)

    The error breaks down into several parts:

    1. Thread 20 terminated abnormally:

      Fairly obviously means that thread number 20 was started, but it terminated as a result of something other than return or 'running off the end of the sub'.

    2. panic: COND_SIGNAL (298) at cm.pl line 487.

      This is the reason it terminated. Perl (threads.pm) itself terminated it because of an unexpected internal error condition (panic).

      In this case, the code is executing (either explicitly in your code or implicitly through perl internal code):

      #define COND_SIGNAL(c) \ STMT_START { \ if ((c)->waiters > 0 && \ ReleaseSemaphore((c)->sem,1,NULL) == 0) \ Perl_croak_nocontext("panic: COND_SIGNAL (%ld)",GetLastError() +); \ } STMT_END

      The 298 is the system error code returned by GetLastError after the ReleaseSemaphore() call fails.

      It translates to "Too many posts were made to a semaphore." which is further explained as

      There is a limited number of times that an event semaphore can be posted. You tried to post an event semaphore that has already been posted the maximum number of times.

      If you are making extensive use of threads::shared::cond_* calls in your code, that could be the root of the problem. If you want help in debugging that further, you will have to "show us the code".

      If you are not using the cond_* calls in your code, then it could be that you've unearthed a bug in Perl itself. You might try upgrading your versions of Perl to (say) 5.10.1. And/or your version of threads &| threads::shared.

    3. Terminating on signal SIGINT(2)

      Under most circumstances, this will only occur if you (or someone) types ^C. Did you fail to mention that your application is hanging and you get the above error message only when you interrupt it?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Hi,

      Thanks for the help, will give me something to investigate. For now, simply turning the number of threads down to 5 seems to have helped.

      I do use $threads$threadno->kill('STOP'); in the code to stop threads that go on for too long. I then trap this with $SIG{'STOP'} = sub {$end_ne=1;}; and test for the value of $end_ne in the code at the end of any part that may take a while. Could this be causing the semaphore errors?

      I can't just kill the long running thread a re-create it, I need to simply back out of the current NE in that thread and set it to a state that the handler can pass it a new NE to try. Unfortunately Perl uses memory up (~5MB from memory) every time a thread starts and doesn't release it until the whole program exits. Given the script could run on 5000 NEs, a major network interuption could therefore see nearly 5000 threads created and killed!!

      I'm not typing ^C and noone else is logged in.

      Thanks

      Graham

        I do use $threads$threadno->kill('STOP'); in the code to stop threads that go on for too long. I then trap this with $SIG{'STOP'} = sub {$end_ne=1;}; and test for the value of $end_ne in the code at the end of any part that may take a while. Could this be causing the semaphore errors?

        Quite likely.

        I do not use signals in conjunction with threads as my initial experiments with them show they a) often seemed to the source of mysterious problems; b) made for hard to debug code; c) achieved nothing that was not more easily and better achieved in other ways.

        For example, for your purpose of interrupting a long running thread by polling the state of a variable, simply making that variable shared and then setting it true from a different thread, achieves the same end without the additional complexities of out-of-line callbacks and all the nastiness that underlies them:

        my @end_ne :shared = (0) x NTHREADS; ... sub threadHandler{ my $tid = threads->tid; ... if( $end_ne[ $tid ] ) { return; } ... } ... if( time() > ... ) { $end_ne[ $someTid ] = 1; }
        Unfortunately Perl uses memory up (~5MB from memory) every time a thread starts and doesn't release it until the whole program exits.

        Hm. Sounds like you are failing to join your old threads, as that is the only way they would continue to consume memory after death. (Most of) Their memory will not be returned to the OS, but it will be returned to the process memory pool for reuse, unless you fail to join them.

        By way of demonstration. The following program starts (checks memory), creates 50 concurrent threads (checks memory), and then signals one thread to die and then replaces it with another until 5000 threads have been created and destroyed.

        After the first 50 are created, the memory stands at 123.4 MB. Subtracting the start-up size of 6.6 MB, that gives 2.3 MB/thread. It then goes on to create and destroy 4950 more threads in quick succession--takes about a minute on my system--and when it's done the total process memory pool has increased to 137.1 MB. Subtract that used by the first 50 and you get 13.7MB/4950 = 0.00276MB/thread. That's just about 3k, and is probably just caused by heap fragmentation.

        Not that I would advocate this method of threading for your application--a pool of threads is the right way to go--but it does lay bare one of many misinterpretations that are made about threaded code.

        #! perl -slw use strict; use threads ( stack_size => 4096 ); use threads::shared; my @end :shared = (0) x 5000; sub thread { my $tid = threads->tid; Win32::Sleep( 10 ) until $end[ $tid ]; --$end[ $tid ]; return; } printf "Check memory: "; <>; threads->create ( \&thread )->detach for 1 .. 50; printf "Check memory: "; <>; for my $tid ( 1 .. 4950 ) { printf "\r$tid"; ++$end[ $tid ]; Win32::Sleep( 10 ) while $end[ $tid ]; threads->create ( \&thread )->detach; } ++$end[ $_ ] for 4950 .. 5000; printf "\nCheck memory: "; <>; __END__ c:\test>t-junk.pl Check memory: 6.6 MB Check memory: 123.4 MB 4950 Check memory: 137.1 MB

        On the basis of the scant description of your application, I think that it could probably be greatly improved with a few tweaks to the mechanisms you are using for 'command & control'.

        Is it possible for you to post the shell of the application--the main code where you create the threads and thread procedure showing the outline of the control mechanisms with the guts of the non-thread related code elided?

        I'm not typing ^C and noone else is logged in.

        Something is causing your process to receive a SIGINT. It may be that your SIGSTOP is being internally translated into a SIGINT by the signals emulation code--the Perl signals emulation on windows does not directly support SIGSTOP. Or this could be some uncharted interaction between the signals emulation in the core and that layered on top by the threads signals. (Which should never have been added in the first place IMO.)

        Again, if you can post your code--with most of the SNMP stuff elided --it might be possible for me to re-create the problem locally and track down the source.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        I don't think that mixing signals and threads actually works. Especially under Windows, where signals are emulated at best.