MonkeyMonk has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks,

As someone new to threads, I got a lot of help from here (and the resident expert) and now have a working model. The original post is here:

Should I use threads? Perl/DHCP/Radius

To summarize: 3 threads exist. One thread reads log files for IP leases and parses MAC. The parsed MAC is used to spawn a new thread with checks a server and when the server sends a HTTP 200 OK, does a authentication calling the localhost and the thread detaches. The 3rd thread simply talks to the localhost to check which MACs are connected and holds/deletes them in a hash which all 3 threads share.

Some other issues related to 5.8.4 version were posted here: Perl5.8.4 , threads - Killing the "async" kid

I have since made two deployments and things are working satisfactorily. The daemon is part of the system start up sequence. All parts of the code seemed to have been proven.

There is however, one irritant though. The daemon dies sometimes forcing me to manually restart it. This has no pattern. Sometimes it dies in a few minutes, sometimes few hours and sometimes it runs until the system is restarted again (24hour cycle).

I tried the following:

  • Enclosed all the threads in eval blocks and printing the $@ to my log file as suggested in Programming Perl.
  • Put signal handlers and logging it to my log file
  • Started the daemon from terminal with
    >> lease.log 2>&1 &
    appended at the end.
  • None of them have written anything at all to the log file when the daemon dies unexpectedly . I repeat that there is no pattern here.

    Any suggestions on how to get it to log something before it dies? Personally, I am looking at writing a wrapper around the daemon which checks its status one in a while. Has anyone done anything like this before? I am looking for advice in general. Please note: upgrade kernel, Perl not possible at all. I have looked into everything, strengthened the code with locks , ensured clean variable usage etc. Still the murderer (and the motives) are unknown, making me sleepless.

    • Comment on Coroner not helping - Perl5.8.4,threadsv1.03 and the case of myterious deaths
    • Download Code

    Replies are listed 'Best First'.
    Re: Coroner not helping - Perl5.8.4,threadsv1.03 and the case of myterious deaths
    by BrowserUk (Patriarch) on Oct 21, 2010 at 12:01 UTC

      It is very unusual for Perl to "die" without issuing some kind of notification.

      Looking back at your original post, I notice that you have use strict twice but no use warnings or -w. If that is still the case, enable warnings. -w or even -W on the shebang line might be a good idea. It just might cause something to get logged.

      If that doesn't help, my next suggestion is to use Devel::Trace. ie. prefix the run command with

      perl -d:Trace yourscript ... >> lease.log 2>&1 &

      Your code will run much more slowly, and your log file will become huge. But, assuming it fails, the log will tell you what was happening when it dies.

      This is my modified Trace.pm that adds thread numbers to the trace output and performs locking on STDERR. The latter requires you created a shared semaphore variable in main: our $semSTDERR :shared:

      # -*- perl -*- package Devel::Trace; use threads; use threads::shared; use Data::Dump qw[ pp ]; $VERSION = '0.10'; $TRACE = 1; $VARS = 0; # This is the important part. The rest is just fluff. sub DB::DB { local $\; return unless $TRACE; my ($p, $f, $l) = caller; my $code = \@{"::_<$f"}; lock( $::semSTDERR ) if defined $::semSTDERR; printf STDERR ">> [%d] %-30s:%5d: %s", ( threads->self->tid || 'n/a' ), ( $f || 'n/a' ), ( $l || -1 ), ( +$code->[$l] || "???\n" ); return unless $VARS; $code->[$l] =~ s/ ( [\$@%] [#]? \w* ) (?: ( (?:->)? (?: [\[{] [^\]}] ++ [\]}] )+ ) )? / no warnings 'uninitialized'; my $var = (defined $2 ? "$1$2" : $1 ); eval qq[ printf "\$var := %s\n", do{ package $p; $var }; ]; $1 . ( $2 || '' ) /gex; } sub import { my $package = shift; foreach (@_) { if ($_ eq 'trace') { my $caller = caller; *{$caller . '::trace'} = \&{$package . '::trace'}; } else { use Carp; croak "Package $package does not export `$_'; aborting"; } } } my %tracearg = ('on' => 1, 'off' => 0); sub trace { my $arg = shift; $arg = $tracearg{$arg} while exists $tracearg{$arg}; $TRACE = $arg; } 1;

      The chances are that if this isn't environmental--process limits or the like--then it is probably a bug long since fixed. I can only emphasis again how much better it would be if you would upgrade your Perl. Not doing so leaves you vulnerable to 6 years of bugs already fixed with no where to go.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Hi. Was out of action for a few days owing to sickness. Thanks for the tips. I smacked by forehead for not setting up warnings. ( I thought I had ) I do not know what happened after that. I was so fed up of the daemon dying that I just decided to look into the most tricky part. Consider this code which is part of one of the spawned threads

        iprint " MAC: $mac : Contacting API......\n" if $debug > 1; # Wrap call to API with eval eval { local $SIG{ALRM} = sub { iprint "Network Delay) alarm ( $maxNetworkDelay) ; $ua = LWP::UserAgent->new( timeout => $maxNetworkDelay); $response = $ua->request( HTTP::Request->new(GET => $AuthURL +forMAC.$mac)); $respcon = $response->content; alarm(0); }; iprint " MAC: $mac $@" if $@;

        ( I know you had mentioned earlier that LWP::Simple would do the job but then I needed to look specifically into the RC codes in the response which LWP::Simple does not support.) 2 threads use the above code for 2 different URLS and the response is looked into for further course of action

        I initially thought that the ALRM signal would be trapped in the eval block and the local signal handler would do the job. Apparently, the $maxNetworkDelay was being triggered when network delays occurs and that actually sends a signal to all threads. This was causing the death of the daemon.

        By simply adding a $SIG{ALRM} = sub { return;) at the very top of the daemon before any thread definition I was able to recover. So when the ALRM is triggered all threads receieve them and return back to the state they were in earlier. MOdified code below with $SIG{ALRM} handler defined as common for all threads and local handler removed.

        iprint " MAC: $mac : Contacting API......\n" if $debug > 1; # Wrap call to API with eval eval { $ua = LWP::UserAgent->new( timeout => $maxNetworkDelay); $response = $ua->request( HTTP::Request->new(GET => $AuthURL +forMAC.$mac)); $respcon = $response->content; }; iprint " MAC: $mac $@" if $@;

        For the first time the daemon has run for 2 days without a problem. Did I know why I wrote the handler at the top? Partly yes, because I knew all threads were getting the signals in 5.8.4.

        Do I know what is happening precisely? Definately not. Would appreciate your views on this

          Apparently, the $maxNetworkDelay was being triggered when network delays occurs and that actually sends a signal to all threads. This was causing the death of the daemon.

          This makes no sense to me at all. I've (briefly) scoured the LWP::UserAgent code, and can see nowhere that it raises a signal. Not as a result of a timeout, nor anything else.

          Did I know why I wrote the handler at the top? Partly yes, because I knew all threads were getting the signals in 5.8.4.

          Do I know what is happening precisely? Definately not. Would appreciate your views on this

          I've recently learned that this (frankly weird) behaviour of signals and threads is different on (at least one variant of) *nix, than it is on Windows. Even with recent versions of both Perl & threads.

          As such, whilst I hope my contributions so far have helped you to reaching a solution, I do not feel qualified to comment further on this. Sorry.


          Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
          "Science is about questioning the status quo. Questioning authority".
          In the absence of evidence, opinion is indistinguishable from prejudice.