Re: Threads, bash, and networking

Preview: Way down the bottom of this long post I suggest a 4-line subroutine, as a replacement for all the code you posted, that does everything that I believe you need. And it uses no threads, no alarms and no signals. Oh, and it will work :) But, please don't skip ahead without reading the intervening material, because if you do, you'll simply dismiss it as too simple. You won't understand the analysis that makes me believe it is all you need.

Does my thread management look okay?

Ignoring for the moment whether your ping + tcpdump strategy is necessary, the simple answer to the above question is: no!

Cutting through all the description and boiling down the code, what you have is this:

threadn {
    eval {
        sighandler = {};
        alarm 15;        ## alarm#1
        start threadn+1;
        start threadn+2;
        threadn+2->join
        threadn+1->join
    }
    alarm 0;
}
threadn+1{
    eval{
        sighandler = {};
        alarm 15;          ## alarm#2
        @input = qx[ ping ];
    }
    alarm 0;
}
threadn+2 {
    @input = qx[ tcpdump;
    return @input;
}
[download]

Some questions (in no particular order, but the semantic do vary depend upon ordering):

When alarm#1 goes off, what are you hoping to interrupt?
- Just the qx// in threadn+2?
  If so, why would it only affect that, and not the qx// in threadn+1?
- Just the threadn+2->join?
  If so, what would happen to threadn+2?
  When its qw// returns, who is going retrieve (join) its return value and allow it to release resources?
  Or are you anticipating that interrupting threadn+2->join is going to somehow make threadn+2 and the external process it started just "go away"?
- Or are you also hoping that the signal raised in threadn is going to somehow permeate its way into both threadn+1 and threadn+2?
  If so, how is threadn+1's signal handler going to differentiate between an alarm signal raised locally and the one raised by its parent?
  And what happens to those two external processes when the code waiting on their output stops waiting?
- What happens if threadn+2 finishes successfully before the alarm goes off?
  You then immediately go to waiting for threadn+1->join.
  But the alarm you set for threadn+2 is still running. What happens if it interrupts that join?
  Its not just possible, but almost inevitable as not only do you set the alarm for threadn lexically before you set the alarm in threadn+1. There is also no guarantee that threadn+1 will even get to run before the first alarm goes off. It is unlikely, but on a heavily laden system it is possible.
  And again, what happens to the ping process? And the output it produces? And the thread storing those results awaiting a join that just got interrupted?
What is the point of threadn+2 anyway?
If the only thing you are going to do after starting a thread, is going into an immediate blocking join on that thread, you've effectively coded expensive sequential statements.
Ie. You would be far better skipping the thread entirely and placing its code in-line, thereby avoiding the threads start-up costs, context switches and blocking waits.
Why give the ping process thread (threadn+1) its own signal handler and not the tcpdump thread?
If you're wondering why I've used threadn, threadn+1, threadn+2 instead of just thread0/thread1/thread2 above, here's the answer:
I assume from your description, that once you have the ability to track a single VM/DNS (using 3 threads) that you intend to start monitoring multiple VM/DNS parings concurrently, by starting a "threadn/n+1/n+1" for each such pairing?
Assuming I'm correct, you then need to consider the semantics of having lots of threads raising lots of alarms intended to interrupt other threads all running and going off concurrently. I hope you can see that all of the above about what threads are going to be affected by what signals; along with what happens to all the threads and processes that get interrupted by them; suddenly compounds out of control.

What I'm trying to point out is that they way you are going about this is very confused. The question of what the semantic of what you are doing should be is really irrelevant, because there is no single, signals and threads semantics, that could possibly make what you are doing, work in a sensible fashion.

Some people will read the above and use it as vindication that "threads are evil"; completely missing the point that you'd have exactly the same set of issues if you are using fork instead of thread->create().

Signals + threads semantics

What actually happens (assuming you're using relatively recent versions of Perl & threads; and upgrade now if you're not!), is pretty much defined by these two paragraphs from the relevant section of the POD:

CAVEAT: The thread signalling capability provided by this module does not actually send signals via the OS. It emulates signals at the Perl-level such that signal handlers are called in the appropriate thread. Correspondingly, sending a signal to a thread does not disrupt the operation the thread is currently working on: The signal will be acted upon after the current operation has completed. For instance, if the thread is stuck on an I/O call, sending it a signal will not cause the I/O call to be interrupted such that the signal is acted up immediately.

In essence, the simple rule is that signals do not cross thread boundaries--just as they do not cross fork bounadaries. Apparently they did in early versions of threads, but that was a bug long since fixed. A mechanism has been added to threads that uses the signals nomenclature for a very limited form of "inter-thread" signalling, but as it won't actually interupt anything, it is effectively useless.

How to fix your current code

So then we come to the point of how to do what you are trying to do. (Still skipping over the issue of whether ping + tcpdump is necessary or desirable>).

You could just move the alarm + signal handler from threadn into threadn+2 (mirroring what you've done in threadn+1). This would probably work, but as I pointed out above there is no point of starting another thread if all the parent is going to do is immediately go into a blocking wait for it to finish.

So that suggests skipping thread2 entirely and moving its code back into threadn. That will probably work, but you must ensure that you cancel the alarm before going into the join on threadn+1. Because if you don't you'll not only create a zombie thread (and possibly a zombie process), but you also lose the output that ping produces.

Unless of course the output from ping is not important to what you are doing? In which case, why are you using backticks to gather it?

But then I grind to a halt, because for the life of me I don't see why you are calling tcpdump in the first place? The only thing you do with its output, is print it to the screen. You never inspect it to make any decisions based upon its content. And you never pass it back to threadn's caller either. So what's its purpose?

And then I try to look at what decisions are making or data you are actually gathering. The only thing I see is that if the ping is successful, you extract $good_dns_name from the output of ping and (indirectly) return that to threadn's caller.

That again raises the question: why are you calling tcpdump at all?

The final rub is, seeing how you never do anything useful with the output of tcpdump if you drop that entirely, you can avoid not just threadn+2, but threadn+1 also. On top of that, you can drop all the alarms and signalling also.

By combining -c count and -i interval, you can not only: ensure that ping only runs for a set period of time, thereby self terminating avoiding the possibility of a zombie process, and the need for threadn+1 and the need for alarm and signals; you can also allow your discovery processes to short-circuit by discovering the good_dns_name immediately after a successful ping.

In other words, all the code you posted can be reduced to (something like):

sub ping {
    my $dns_name = shift;
    open my $cmd, "ping -c 15 -i 1 $dns_name" or die;
    m/PING (.*).et.byu.edu/ and return $1 while <$cmd>;
    return;
}
[download]

Now, I don't really know what tcpdump does. I can hazard a guess based on the name, but on the evidence of the code you've posted, whatever it does, you are not using it for anything useful. If it really is required for your solution, then explain why and I'll incorporate it into a solution. Which I guarantee will be a whole lot simpler than what you are doing currently. Better yet, it'll work :)

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

RIP an inspiration; A true Folk's Guy

Comment on Re: Threads, bash, and networking Select or Download Code

Replies are listed 'Best First'.
Re^2: Threads, bash, and networking by morganda (Initiate) on Nov 23, 2010 at 19:45 UTC
First off, my apologies for the long overdue reply. Various other things have demanded my attention, and I wanted to have the time to make sure I had given it a good honest effort before I reported back. The story is a success by the way. @zentara – Thanks for the reference to the thread->kill feature. It was most helpful. I ended up going with your suggestion of killing tcpdump by its pid because I couldn't seem to end the thread without tcpdump first ending (I/O operation). @juster & BrowserUK– I now see I have a SIGALRM fail on my part. The clarification was helpful. Arping unfotunately won't work for what I've been assigned to accomplish. Arping works on the subnet and for its part work well. Two problems I see using it (if my understanding is correct), is it relies on a cache which means for some reason or other, I won't always have valid information (especially since DNS management is also bad here), but also arping doesn't reach across routers. If the DNS does not apply to anything locally I still need to know if it being used elsewhere so I can don't mark it as unused. @BrowserUK – I read everything you wrote and found A LOT of mistakes in what I coded up. Trying to answer your questions made me realize that some of my code was more wishful thinking than an actual solution. This was also evident in your use of the word “hope” in most of those questions. Now that I almost have my head screwed on straight, let me re-clarify the problem, the reason tcpdump needs to be part of the solution, and what my solution is (although it needs some tuning still). Problem: We need to know the location of various Virtual Machines in case they go down. Unfortunately there has been no solid naming convention and so there is a lot of confusion as to where a given virtual machine might be. Instead of taking the time to fix the naming convention I've been assigned the less efficient although quite educational task of writing a script to determine what virtual machines are running on each host. Soltion: After doing some voodoo to determine possible DNS names of the Virtual Machines located on a host, I try and ping them. A ping alone will not give me my answer as the DNS name may be associated with some other server located elsewhere on the network. When I use tcpdump I can listen on the hosts eth0 interface for incoming packets. All packets going to the virtual machines are routed through eth0. So by starting an instance of tcpdump and sending some pings through, I can be sure that that DNS is associated with a virtual machine on that host. Then I can update the location (host) of that virtual machine in the database. Code (As you can see it's far more simplified and makes much more sense. At least to me. And it has been working.): ... my $dump_thread = threads->new(\&tcp_dump, $macaddress, $dns_name); sleep 1; my $ping_thread = threads->new(\&ping, $dns_name); ... $ping_thread->join(); ... // this section checks if tcpdump captured a packet and does something + with it ... my $dump_pid = `pidof tcpdump`; `kill $dump_pid`; $dump_thread->join(); ... [download] Notes: I had to kill the tcpdump process in order to end the thread because it's considered an I/O operation (particulars for this have already been described above). After I completed that I was free to join the thread. No the thread for my ping operation isn't necessary. I set a timeout as well as a 2 ping count so no matter what it will end. I went back to fix it, but I posted it here, so maybe it'll serve some benefit to someone else. Thank you all for all of your replies and references. They were very helpful and I picked up a lot and now know at least a small bit of PERL kung fu.	[reply] [d/l]

Replies are listed 'Best First'.

Re^2: Threads, bash, and networking
by morganda (Initiate) on Nov 23, 2010 at 19:45 UTC

First off, my apologies for the long overdue reply. Various other things have demanded my attention, and I wanted to have the time to make sure I had given it a good honest effort before I reported back. The story is a success by the way.

@zentara – Thanks for the reference to the thread->kill feature. It was most helpful. I ended up going with your suggestion of killing tcpdump by its pid because I couldn't seem to end the thread without tcpdump first ending (I/O operation).

@juster & BrowserUK– I now see I have a SIGALRM fail on my part. The clarification was helpful. Arping unfotunately won't work for what I've been assigned to accomplish. Arping works on the subnet and for its part work well. Two problems I see using it (if my understanding is correct), is it relies on a cache which means for some reason or other, I won't always have valid information (especially since DNS management is also bad here), but also arping doesn't reach across routers. If the DNS does not apply to anything locally I still need to know if it being used elsewhere so I can don't mark it as unused.

@BrowserUK – I read everything you wrote and found A LOT of mistakes in what I coded up. Trying to answer your questions made me realize that some of my code was more wishful thinking than an actual solution. This was also evident in your use of the word “hope” in most of those questions.

Now that I almost have my head screwed on straight, let me re-clarify the problem, the reason tcpdump needs to be part of the solution, and what my solution is (although it needs some tuning still).

Problem: We need to know the location of various Virtual Machines in case they go down. Unfortunately there has been no solid naming convention and so there is a lot of confusion as to where a given virtual machine might be. Instead of taking the time to fix the naming convention I've been assigned the less efficient although quite educational task of writing a script to determine what virtual machines are running on each host.

Soltion: After doing some voodoo to determine possible DNS names of the Virtual Machines located on a host, I try and ping them. A ping alone will not give me my answer as the DNS name may be associated with some other server located elsewhere on the network. When I use tcpdump I can listen on the hosts eth0 interface for incoming packets. All packets going to the virtual machines are routed through eth0. So by starting an instance of tcpdump and sending some pings through, I can be sure that that DNS is associated with a virtual machine on that host. Then I can update the location (host) of that virtual machine in the database.

...
my $dump_thread = threads->new(\&tcp_dump, $macaddress, $dns_name);
sleep 1;
my $ping_thread = threads->new(\&ping, $dns_name);
...
$ping_thread->join();
...
// this section checks if tcpdump captured a packet and does something
+ with it 
...
my $dump_pid = `pidof tcpdump`;
`kill $dump_pid`;
$dump_thread->join();
...
[download]

Notes: I had to kill the tcpdump process in order to end the thread because it's considered an I/O operation (particulars for this have already been described above). After I completed that I was free to join the thread.

No the thread for my ping operation isn't necessary. I set a timeout as well as a 2 ping count so no matter what it will end. I went back to fix it, but I posted it here, so maybe it'll serve some benefit to someone else.

Thank you all for all of your replies and references. They were very helpful and I picked up a lot and now know at least a small bit of PERL kung fu.

[reply]
[d/l]