mbr has asked for the wisdom of the Perl Monks concerning the following question:

Hi -

I have written a perl program that runs as a daemon on my Linux system but unfortunately the program dies sporadically (once every three weeks or so of continuous execution) and I'm trying to find out why. I use the -w switch and also strict, but I have not been able to consistently cause the program to die so it is difficult to troubleshoot. The program seems to die under both perl version 5.6.0 and 5.6.1. I have not tried running the program under the perl debugger since 99% of the time the program seems to run perfectly and I can't seem to reproduce the conditions needed to cause any problems. Can you offer any general advise on a method to debug the problem? Perhaps running the program under the debugger might offer some insight anyway? Maybe there is a way to cause perl to print a stack trace if the process receives a SEGV signal or something?

In case you are curious, the program is the "port scan attack detector (psad)" which has recently been integrated with Bastille Linux, is released under the GPL, and is available here: http://www.cipherdyne.com

Thanks,
--mbr

Replies are listed 'Best First'.
Re: debugging strategy?
by hossman (Prior) on Mar 17, 2002 at 05:21 UTC
    there are a lot of things at the URL you listed, but assuming you were talking about this, then the first thing i notice is that the only signals you are cahtching consistently are USR1 SEGV and CHLD .. you might also try registering a handler for __DIE__ and the rest -- it could prove enlightening.
      Thanks for the __DIE__ handler suggestion. I went and read some in the Camel book and that looks like a really cool idea. Now I just have to make the handler give some useful info... I guess the first thing would be to log STDERR somewhere?
Re: debugging strategy?
by grep (Monsignor) on Mar 17, 2002 at 05:37 UTC

    As hossman points out a __DIE__ handler would be helpful since I noticed you do have some die's but you are only sending the message to STDERR not to your log files.

    It also looks like you started to use syslogd, but stopped to use your own logfiles. I would recommend going to syslogd and you'll be able to get away from some of the pitfalls with straight logfiles(e.g. multiple instances of you program, disk space, permissions, not able to remotely monitor without NFS). There are some CPAN modules for a OO interface to syslogd: Net::Syslog Tie::Syslog Unix::Syslog



    grep
    grep> cd /pub
    grep> more beer
      Maybe I should use syslogd to print diagnostic info? That would make better sense than my current strategy. I started to use Sys::Syslog but didn't really stick with it sense at the time there seemed to be some problems using with with perl-5.005_03. I should definitely revisit this. Thanks.
Re: debugging strategy?
by dws (Chancellor) on Mar 17, 2002 at 05:15 UTC
    You might run your daemon from a shell window (instead of daemonized). It might emit some useful information when crashing. Or load the whole thing up in gdb and let it run.

Re: debugging strategy?
by jplindstrom (Monsignor) on Mar 17, 2002 at 18:52 UTC
    Two words: Log everything.

    Since you can't reproduce the error and see what's happening, you have to keep looking at all times in order to be there when the bug strikes; you need a ridiculously detailed debug log file.

    You can probably afford to delete it at regular intervals when things go right, because it's bound to grow fast. gzipping it works as well. A cron entry might be good for this.

    You don't want to/can't delete an opened file, so open, print, close at each log entry to prevent perl from locking it.

    Just make sure you can log this much without affecting efficiency before you deploy :)


    /J

    PS. Log files are a Good Thing. Consider using a general logging strategy to keep track of what the program does.

Re: debugging strategy?
by BeernuT (Pilgrim) on Mar 17, 2002 at 05:21 UTC
    EDITED: found the url

    -bn