in reply to Re^4: Perl Thread Quitting Abnormally
in thread Perl Thread Quitting Abnormally

I do use $threads$threadno->kill('STOP'); in the code to stop threads that go on for too long. I then trap this with $SIG{'STOP'} = sub {$end_ne=1;}; and test for the value of $end_ne in the code at the end of any part that may take a while. Could this be causing the semaphore errors?

Quite likely.

I do not use signals in conjunction with threads as my initial experiments with them show they a) often seemed to the source of mysterious problems; b) made for hard to debug code; c) achieved nothing that was not more easily and better achieved in other ways.

For example, for your purpose of interrupting a long running thread by polling the state of a variable, simply making that variable shared and then setting it true from a different thread, achieves the same end without the additional complexities of out-of-line callbacks and all the nastiness that underlies them:

my @end_ne :shared = (0) x NTHREADS; ... sub threadHandler{ my $tid = threads->tid; ... if( $end_ne[ $tid ] ) { return; } ... } ... if( time() > ... ) { $end_ne[ $someTid ] = 1; }
Unfortunately Perl uses memory up (~5MB from memory) every time a thread starts and doesn't release it until the whole program exits.

Hm. Sounds like you are failing to join your old threads, as that is the only way they would continue to consume memory after death. (Most of) Their memory will not be returned to the OS, but it will be returned to the process memory pool for reuse, unless you fail to join them.

By way of demonstration. The following program starts (checks memory), creates 50 concurrent threads (checks memory), and then signals one thread to die and then replaces it with another until 5000 threads have been created and destroyed.

After the first 50 are created, the memory stands at 123.4 MB. Subtracting the start-up size of 6.6 MB, that gives 2.3 MB/thread. It then goes on to create and destroy 4950 more threads in quick succession--takes about a minute on my system--and when it's done the total process memory pool has increased to 137.1 MB. Subtract that used by the first 50 and you get 13.7MB/4950 = 0.00276MB/thread. That's just about 3k, and is probably just caused by heap fragmentation.

Not that I would advocate this method of threading for your application--a pool of threads is the right way to go--but it does lay bare one of many misinterpretations that are made about threaded code.

#! perl -slw use strict; use threads ( stack_size => 4096 ); use threads::shared; my @end :shared = (0) x 5000; sub thread { my $tid = threads->tid; Win32::Sleep( 10 ) until $end[ $tid ]; --$end[ $tid ]; return; } printf "Check memory: "; <>; threads->create ( \&thread )->detach for 1 .. 50; printf "Check memory: "; <>; for my $tid ( 1 .. 4950 ) { printf "\r$tid"; ++$end[ $tid ]; Win32::Sleep( 10 ) while $end[ $tid ]; threads->create ( \&thread )->detach; } ++$end[ $_ ] for 4950 .. 5000; printf "\nCheck memory: "; <>; __END__ c:\test>t-junk.pl Check memory: 6.6 MB Check memory: 123.4 MB 4950 Check memory: 137.1 MB

On the basis of the scant description of your application, I think that it could probably be greatly improved with a few tweaks to the mechanisms you are using for 'command & control'.

Is it possible for you to post the shell of the application--the main code where you create the threads and thread procedure showing the outline of the control mechanisms with the guts of the non-thread related code elided?

I'm not typing ^C and noone else is logged in.

Something is causing your process to receive a SIGINT. It may be that your SIGSTOP is being internally translated into a SIGINT by the signals emulation code--the Perl signals emulation on windows does not directly support SIGSTOP. Or this could be some uncharted interaction between the signals emulation in the core and that layered on top by the threads signals. (Which should never have been added in the first place IMO.)

Again, if you can post your code--with most of the SNMP stuff elided --it might be possible for me to re-create the problem locally and track down the source.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
RIP an inspiration; A true Folk's Guy

Replies are listed 'Best First'.
Re^6: Perl Thread Quitting Abnormally
by Anonymous Monk on Jul 07, 2010 at 15:56 UTC

    Hi,

    Thanks for the suggestion of using a shared array for $end_ne . I was initially using a signal to kill the thread and the restart it, when I change dto just setting an array I didn't think to use the array. Will change that.

    You asked me to post the command and control code:

    my @hosts = keys %nes; my @hosts_ok; my $max_threads = $config->val($section,'MaxThreads'); for my $threadno (0 .. ($max_threads-1)){ if (@hosts){ my $host = shift (@hosts); $thread_ip[$threadno] = $host; $thread_result[$threadno] = 0; $thread_time[$threadno] = time(); push @threads, threads->create('get_ne_data',$threadno); $threads[@threads-1]->detach(); write_log ('MAIN','STARTING THREADS',1,'001b','Starting thread +',$threads[@threads-1]->tid(),'NE',$host); sleep(1); } } while (@hosts){ for my $threadno (0 .. (@threads-1)){ if ($threads[$threadno] and $threads[$threadno]->is_running() +){ if ($thread_result[$threadno]){ # thread as returned if ($thread_result[$threadno] == 2){ write_log ('MAIN','PROCESSING NES',2,'001c','Threa +d',$threads[$threadno]->tid(),'NE',$thread_ip[$threadno],'Finished OK +'); push @hosts_ok, $thread_ip[$threadno]; } else { write_log ('MAIN','PROCESSING NES',2,'001d','Threa +d',$threads[$threadno]->tid(),'NE',$thread_ip[$threadno],'Finished NO +K'); } $thread_time[$threadno] = time(); $thread_result[$threadno] = 0; if (@hosts){ $thread_ip[$threadno] = shift @hosts; } else { # print "\tFINISHED\n"; $thread_ip[$threadno] = 'FINISHED'; } write_log ('MAIN','PROCESSING NES',2,'001e','Thread',$ +threads[$threadno]->tid(),'is being assigned',$thread_ip[$threadno]); } elsif ($thread_time[$threadno] < (time-60) ){ write_log ('MAIN','PROCESSING NES',2,'001f','Thread',$ +threads[$threadno]->tid(),'NE',$thread_ip[$threadno],'Being killed'); # print "\t$thread_ip[$threadno] is being killed\n"; $threads[$threadno]->kill('STOP'); $thread_time[$threadno] = time(); $thread_result[$threadno] = 0; if (@hosts){ $thread_ip[$threadno] = shift @hosts; } else { # print "\tFINISHED\n"; $thread_ip[$threadno] = 'FINISHED'; } write_log ('MAIN','PROCESSING NES',2,'0020','Thread',$ +threads[$threadno]->tid(),'is being assigned',$thread_ip[$threadno]); } } else { # for some reason we don't have a thread here - possibly sto +pped due to long run time # print "\tTrying to restart thread\n"; if (@hosts){ my $host = shift (@hosts); $thread_ip[$threadno] = $host; $thread_result[$threadno] = 0; $thread_time[$threadno] = time(); $threads[$threadno] = threads->create('get_ne_data',$t +hreadno); $threads[$threadno]->detach(); write_log ('MAIN','PROCESSING NES',4,'0021','THREAD NE +EDS RESTARTING',$threads[$threadno]->tid(),'NE',$thread_ip[$threadno] +); } } } sleep(1); } write_log ('MAIN','',2,'0022','All hosts started'); my $threads_running = 1; while ($threads_running){ $threads_running = 0; for my $threadno (0 .. (@threads-1)){ if ($threads[$threadno] and $threads[$threadno]->is_running() +){ $threads_running++; if ($thread_result[$threadno]){ # thread as returned if ($thread_result[$threadno] == 2){ write_log ('MAIN','CLEARUP',2,'0022','Thread',$thr +eads[$threadno]->tid(),'NE',$thread_ip[$threadno],'Finished OK'); push @hosts_ok, $thread_ip[$threadno]; } else { write_log ('MAIN','CLEARUP',2,'0023','Thread',$thr +eads[$threadno]->tid(),'NE',$thread_ip[$threadno],'Finished NOK'); } $thread_ip[$threadno] = 'FINISHED'; $thread_result[$threadno] = 0; } elsif ($thread_time[$threadno] < (time-60) ){ write_log ('MAIN','CLEARUP',2,'0024','Thread',$threads +[$threadno]->tid(),'NE',$thread_ip[$threadno],'Being killed'); $threads[$threadno]->kill('STOP'); $thread_time[$threadno] = time(); $thread_result[$threadno] = 0; $thread_ip[$threadno] = 'FINISHED'; } } } }

      Your architecture means that your main thread is doing an awful lot of "busy work". Continuously looping around polling each of the threads to see if it has completed it current work item so that it can give it a new one. That means that when a given thread finishes one item it must then wait until the main thread gets around to checking its state and issue it with another host ip before it can continue doing anything useful.

      That is not a good architecture. I suspect that your main thread will be consuming a substantial amount of your cpu resources basically doing the equivalent of the perpetually asking "Are we there yet? Are we there yet?".

      A thread pool + queue architecture, such as I outlined in Re: How to create thread pool of ithreads would be more efficient and easier to reason about.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.