Preview: Way down the bottom of this long post I suggest a 4-line subroutine, as a replacement for all the code you posted, that does everything that I believe you need. And it uses no threads, no alarms and no signals. Oh, and it will work :) But, please don't skip ahead without reading the intervening material, because if you do, you'll simply dismiss it as too simple. You won't understand the analysis that makes me believe it is all you need.
Does my thread management look okay?
Ignoring for the moment whether your ping + tcpdump strategy is necessary, the simple answer to the above question is: no!
Cutting through all the description and boiling down the code, what you have is this:
threadn {
eval {
sighandler = {};
alarm 15; ## alarm#1
start threadn+1;
start threadn+2;
threadn+2->join
threadn+1->join
}
alarm 0;
}
threadn+1{
eval{
sighandler = {};
alarm 15; ## alarm#2
@input = qx[ ping ];
}
alarm 0;
}
threadn+2 {
@input = qx[ tcpdump;
return @input;
}
Some questions (in no particular order, but the semantic do vary depend upon ordering):
- When alarm#1 goes off, what are you hoping to interrupt?
- Just the qx// in threadn+2?
If so, why would it only affect that, and not the qx// in threadn+1?
- Just the threadn+2->join?
If so, what would happen to threadn+2?
When its qw// returns, who is going retrieve (join) its return value and allow it to release resources?
Or are you anticipating that interrupting threadn+2->join is going to somehow make threadn+2 and the external process it started just "go away"?
- Or are you also hoping that the signal raised in threadn is going to somehow permeate its way into both threadn+1 and threadn+2?
If so, how is threadn+1's signal handler going to differentiate between an alarm signal raised locally and the one raised by its parent?
And what happens to those two external processes when the code waiting on their output stops waiting?
- What happens if threadn+2 finishes successfully before the alarm goes off?
You then immediately go to waiting for threadn+1->join.
But the alarm you set for threadn+2 is still running. What happens if it interrupts that join?
Its not just possible, but almost inevitable as not only do you set the alarm for threadn lexically before you set the alarm in threadn+1. There is also no guarantee that threadn+1 will even get to run before the first alarm goes off. It is unlikely, but on a heavily laden system it is possible.
And again, what happens to the ping process? And the output it produces? And the thread storing those results awaiting a join that just got interrupted?
- What is the point of threadn+2 anyway?
If the only thing you are going to do after starting a thread, is going into an immediate blocking join on that thread, you've effectively coded expensive sequential statements.
Ie. You would be far better skipping the thread entirely and placing its code in-line, thereby avoiding the threads start-up costs, context switches and blocking waits.
- Why give the ping process thread (threadn+1) its own signal handler and not the tcpdump thread?
- If you're wondering why I've used threadn, threadn+1, threadn+2 instead of just thread0/thread1/thread2 above, here's the answer:
I assume from your description, that once you have the ability to track a single VM/DNS (using 3 threads) that you intend to start monitoring multiple VM/DNS parings concurrently, by starting a "threadn/n+1/n+1" for each such pairing?
Assuming I'm correct, you then need to consider the semantics of having lots of threads raising lots of alarms intended to interrupt other threads all running and going off concurrently. I hope you can see that all of the above about what threads are going to be affected by what signals; along with what happens to all the threads and processes that get interrupted by them; suddenly compounds out of control.
What I'm trying to point out is that they way you are going about this is very confused. The question of what the semantic of what you are doing should be is really irrelevant, because there is no single, signals and threads semantics, that could possibly make what you are doing, work in a sensible fashion.
Some people will read the above and use it as vindication that "threads are evil"; completely missing the point that you'd have exactly the same set of issues if you are using fork instead of thread->create().
Signals + threads semantics
What actually happens (assuming you're using relatively recent versions of Perl & threads; and upgrade now if you're not!), is pretty much defined by these two paragraphs from the relevant section of the POD:
CAVEAT: The thread signalling capability provided by this module does not actually send signals via the OS. It emulates signals at the Perl-level such that signal handlers are called in the appropriate thread.
Correspondingly, sending a signal to a thread does not disrupt the operation the thread is currently working on:
The signal will be acted upon after the current operation has completed. For instance, if the thread is stuck on an I/O call, sending it a signal will not cause the I/O call to be interrupted such that the signal is acted up immediately.
In essence, the simple rule is that signals do not cross thread boundaries--just as they do not cross fork bounadaries. Apparently they did in early versions of threads, but that was a bug long since fixed. A mechanism has been added to threads that uses the signals nomenclature for a very limited form of "inter-thread" signalling, but as it won't actually interupt anything, it is effectively useless.
How to fix your current code
So then we come to the point of how to do what you are trying to do. (Still skipping over the issue of whether ping + tcpdump is necessary or desirable>).
You could just move the alarm + signal handler from threadn into threadn+2 (mirroring what you've done in threadn+1). This would probably work, but as I pointed out above there is no point of starting another thread if all the parent is going to do is immediately go into a blocking wait for it to finish.
So that suggests skipping thread2 entirely and moving its code back into threadn. That will probably work, but you must ensure that you cancel the alarm before going into the join on threadn+1. Because if you don't you'll not only create a zombie thread (and possibly a zombie process), but you also lose the output that ping produces.
Unless of course the output from ping is not important to what you are doing? In which case, why are you using backticks to gather it?
But then I grind to a halt, because for the life of me I don't see why you are calling tcpdump in the first place? The only thing you do with its output, is print it to the screen. You never inspect it to make any decisions based upon its content. And you never pass it back to threadn's caller either. So what's its purpose?
And then I try to look at what decisions are making or data you are actually gathering. The only thing I see is that if the ping is successful, you extract $good_dns_name from the output of ping and (indirectly) return that to threadn's caller.
That again raises the question: why are you calling tcpdump at all?
The final rub is, seeing how you never do anything useful with the output of tcpdump if you drop that entirely, you can avoid not just threadn+2, but threadn+1 also. On top of that, you can drop all the alarms and signalling also.
By combining -c count and -i interval, you can not only: ensure that ping only runs for a set period of time, thereby self terminating avoiding the possibility of a zombie process, and the need for threadn+1 and the need for alarm and signals; you can also allow your discovery processes to short-circuit by discovering the good_dns_name immediately after a successful ping.
In other words, all the code you posted can be reduced to (something like): sub ping {
my $dns_name = shift;
open my $cmd, "ping -c 15 -i 1 $dns_name" or die;
m/PING (.*).et.byu.edu/ and return $1 while <$cmd>;
return;
}
Now, I don't really know what tcpdump does. I can hazard a guess based on the name, but on the evidence of the code you've posted, whatever it does, you are not using it for anything useful. If it really is required for your solution, then explain why and I'll incorporate it into a solution. Which I guarantee will be a whole lot simpler than what you are doing currently. Better yet, it'll work :)
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
|