lpoht has asked for the wisdom of the Perl Monks concerning the following question:

Hi - I am modifying someone else's code for test automation. Basically the script runs as a cronjob on a client machine. Everytime it starts it creates a lockfile (if one does not already exist) and whenever it exits, it releases the lock file. It is important to note that nowhere is a bare "exit" except in the releaseLock() sub. The problem I am having is that on occasion the script will simply exit for no reason. It does not release the lock so I know that it is not exiting properly but literally dying. This happens so sporadically that I can't give much more information. Can anyone suggest reasons why this might happen? I've tried to reproduce it by running the script manually or in a debugger but have not yet succeeded. Any info would be most appreciated - I don't have much hair left to pull out.

Replies are listed 'Best First'.
Re: Perl scripts quit unexpectedly
by GrandFather (Saint) on Jul 27, 2005 at 04:26 UTC

    Can you log STDERR in the test system? It may be that some error is reported as the code dies.

    If you have access to the code under test you may be able to scatter debugging print statements through the code so at least you can narrow down the places where the code is failing.


    Perl is Huffman encoded by design.
Re: Perl scripts quit unexpectedly
by ikegami (Patriarch) on Jul 27, 2005 at 06:10 UTC

    Some solutions, from best to worst:

    1) Why aren't you using flock? Create the lockfile if it doesn't exist, then flock it. The lock will be cleared if the process exits by any means. Don't worry about deleting the lockfile (but you can if you want).

    2) Another way is to store the locker's pid in the lockfile. To check if the lock is active check for both the presence of the lockfile and the presense of the process mentioned therein. If the process isn't running, the lock must be stale.

    3) Use END. This will work if the script dies, but it won't work if the script is killed.

    { my $lock = ...; my $on_exit = sub { releaseLock($lock); }; eval 'END { $on_exit->() }'; }

    You shouldn't modify the END line at all, lest you risk introducing bugs and security problems. Make your changes to on_exit instead.

      Thanks for the tips. I have updated the code to use flock - a much smarter method. I was thinking that an END block may be a good way to debug how this script is exiting. However, since it's not a function call I am not sure what useful information I can print out in the END block.

      Will an END block run if the process receives a SIGKILL, SIGSTOP, or SIGQUIT? If so, is there any way to log the recipt of these signals since I can't create a handler for the first two. Thanks
        As I understand things, SIGKILL is never received by the application, so you wouldn't be able to log that one. The whole idea behind that signal is to kill non-responsive processes, a task ill-suited for the process being killed. I'm not familiar with the other signals you mentioned.
Re: Perl scripts quit unexpectedly
by tlm (Prior) on Jul 27, 2005 at 04:36 UTC

    That's a tough situation. The best I can recommend is to instrument your code to log as much info as possible; a logging module such as Log::Log4perl is useful for this. In this way at least you'll be able to narrow down where the code fails.

    Make sure you run the script under use warnings. (It is unusual for a Perl script to die silently in any case, but having warnings on can often provide additional clues.)

    the lowliest monk

Re: Perl scripts quit unexpectedly
by jbrugger (Parson) on Jul 27, 2005 at 04:35 UTC
    Might be any problem you can think of, On this moment you give too little information, the only thing i can suggest you, is look at the logfiles like grandfather suggested

    Perhaps you should give more information, like what os (bsd/linux/MacOS X), distribution, source of the script etc. etc.

    "We all agree on the necessity of compromise. We just can't agree on when it's necessary to compromise." - Larry Wall.
      As I mentioned, it's a little tricky to post code since the errors are so random. I will provide an example that just cropped up. The system is a multi-way linux box.
      my $cmd = ". $basedir/testEnv.txt; . stp_include.sh && ./wrap.sh $scri +pt_param &> ../logs/run-log.txt"; syslog "Executing test with command $cmd"; syslog `$cmd`; $ret = -1 unless ( $? == 0 ); print "Done executing";
      This code block executes a wrapper script that calls other executables. These executables can take hours to run. Just now I had a run of this where the wrapper finished (the run-log shows it running to completion). However, I don't see either the following print nor the one after the call to this function. This suggests it died soemtime around the exit of the wrap.sh file.

      I am now watching the execution and I see that the parent script (the one that contains the above code) exits while the child (wrap.sh) runs. Doing a ps shows the wrap.sh running but no parent. Any ideas about how to troubleshoot?. As I said these problems are not easily reproducable so the next time I run this same code (which I will later today) it may run well or not.