Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I work with programs that interact across different servers, and recently one of them has been leaving behind processes after exiting in certain ways (that is to say, if I've used the wrong input for it and had to close manually). This is on a server used by several people, though, and I'm never really sure when I've left something behind. Also, closing things by hand was difficult, as I had to check several nodes for processes.

That's the intro - I've written a script to go to each node, find the right processes, make sure they're not supposed to be there, and then kill the offenders. Thing is, I'm not so good with perl, so I figured I'd put this up to see if anyone has critiques so I can learn.

#!/bin/perl use strict; use warnings; # Known problems: # -Issues for runs started around the new year # -Broken runs started at the same time as long # good runs will continue as long as the good run # if the 'ps' output changes order, than may cancel the wrong process # Get the currently in-process run times - grep is to specify only act +ual run lines my $rundate = `/opt/lsdyna/License/lstc_qrun | grep @ | cut -c 57-68`; #cleandate subroutine is at the bottom of the file &cleandate($rundate); #splitting the dates into an array - allows easier 'foreach' loops my @rundates = split(/\s+/,"$rundate"); my @nodes = ("node2", "node3", "node4", "node5", "node6", "node7", "no +de8", "node9", "node10"); #this main, big loop allows each node to be checked foreach my $nodenum (@nodes) { print "Testing $nodenum: "; #pull the mpp processes from the node, then take and clean process + ids my $procid = `ssh $nodenum ps -eo pid,lstart,cmd | grep mpp | cut +-c 1-5`; $procid =~ s/ //g; my @procids = split(/\s+/,"$procid"); # print "@procids\n"; #These two are purely for output, make sure that it's running my $procnum = scalar(@procids); print "$procnum processes found.\n"; #pull the mpp processes from the node, then take and clean process + dates my $procdate = `ssh $nodenum ps -eo pid,lstart,cmd | grep mpp | cu +t -c 11-22`; &cleandate($procdate); my @procdates = split(/\s+/,"$procdate"); #This is the comparison loop; each of the dates in @procdates is c +ompared to #each of the dates in @rundates. If all of the comparisons are mor +e than an #hour apart (tested through $counter), then the process is killed. + $procloop #is just an index, so that the corresponding part of $procid is ki +lled. my $procloop = 0; my $counter = 0; foreach my $pd (@procdates) { $counter = 0; foreach my $rd (@rundates) { my $timedif = abs($pd-$rd); if ($timedif >= 60) { $counter++ } } if ($counter == scalar(@rundates)) { print "$procids[$procloop] is outside expected time range. +\n" ; my $cmd = "ssh $nodenum kill -9 $procids[$procloop]"; #print is to see if anything is killed, and make sure comm +and syntax is correct print "$cmd\n"; system ($cmd); } $procloop++; } } sub cleandate { #This takes the Mon DD HR:MN format and makes it into a numeric v +alue for comparison #first line adds leading 0 to single-digit dates $_[0] =~ s/\s(\d\s)/0$1/g; #change month to a numeric value $_[0] =~ s/Jan/01/g; $_[0] =~ s/Feb/02/g; $_[0] =~ s/Mar/03/g; $_[0] =~ s/Apr/04/g; $_[0] =~ s/May/05/g; $_[0] =~ s/Jun/06/g; $_[0] =~ s/Jul/07/g; $_[0] =~ s/Aug/08/g; $_[0] =~ s/Sep/09/g; $_[0] =~ s/Oct/10/g; $_[0] =~ s/Nov/11/g; $_[0] =~ s/Dec/12/g; #remove colons $_[0] =~ s/://g; #remove spaces $_[0] =~ s/ //g; #page breaks from grepping are left, to allow splitting above. }

Replies are listed 'Best First'.
Re: Killing Processes Code Review
by ahmad (Hermit) on Sep 11, 2010 at 04:18 UTC

    I did not understand what you are trying to do exactly.

    Are you trying to kill the process if it's been running for more than 1 hour ? or you are killing the process if it doesn't match a given date/time you saved in a local file?

    Does your code actually do the job? The process killing part might kill 1 process only ... is that what you really want? Here's your code which might be matched 1 time only

    if ($counter == scalar(@rundates)) { print "$procids[$procloop] is outside expected time range. +\n" ; my $cmd = "ssh $nodenum kill -9 $procids[$procloop]"; #print is to see if anything is killed, and make sure comm +and syntax is correct print "$cmd\n"; system ($cmd); }

    And you can avoid some of your problems like # if the 'ps' output changes order, than may cancel the wrong process by not running separate ssh commands , but using 1 command and parse the output yourself.

      Sorry for not responding until now - should have thought about putting this up on a Friday. But to answer:

      Are you trying to kill the process if it's been running for more than 1 hour ? or you are killing the process if it doesn't match a given date/time you saved in a local file?

      A little of both. I get times from the first grep command, then compare them to the second and third grep results. If they differ by more than an hour, then it should kill the process. I put the time difference in because the ps command was giving times from the future, so I thought some margin would be good.

      Does your code actually do the job? The process killing part might kill 1 process only ... is that what you really want?

      I only want it to kill processes outside of the time buffer - so even if doesn't actually kill anything, that's fine. The goal is just to get those accidentally left behind, not the ones actually in use. And it does seem to work, at least based on the tests I could come up with. Not knowing why processes continue makes testing hard.

      parse the output yourself

      The idea with this is to be automated, and possibly even to run automatically at given points. It's not really that much work to do the whole code's function manually, but it needs to be done regularly.