morganda has asked for the wisdom of the Perl Monks concerning the following question:

Background: I've been commissioned by my supervisor to create a script that will identify what virtual machines are located on a given server. This problem arose because no naming convention was established between the virtual machine configuration name and the DNS name (ie. a VM called taco has a DNS name of hotdog). While I think it would just be better to go establish a naming convention and spend a few hours to rename all the configuration files to the DNS names, my superiors want a "clever solution".


Purpose: If a machine critical VM goes down, they need to know what machine it is on to resolve the problem(s).


What I Use: Doing some voodoo I manage to pull out possible DNS names from the database to what an actual VM might be associated with. The goal initially was to use tcpdump to listen for traffic on each DNS, timing out after say 15 seconds or a minute or even 5 minutes if there is not traffic. Since this runs as a cron job at night, there isn't a lot of traffic and the script thinks that the real DNS name is bogus since tcpdump ends up not returning anything useful.


Proposed Solution: I am new to thread programming, but I understand the concept to a minimal degree. The idea is to launch a series of say 10 pings at the DNS name and have tcpdump listen for those pings. I had two terminals open, I started the pinging and then started the tcpdump. With pefect commands and associated parameters I figured I could plug and chug and everything would work wonderfully.


Problem: I run tcpdump in an eval block with a SIG{ALRM} to kill it after so long if it doesn't get any packets. Unfortunately it doesn't seem to be working. I'm suspicious this is a newbish misuse of threads that's causing the problem, because a previous eval statement killed tcpdump just fine, but now it won't.


Code:

sub vm_tcp_dump { my $macaddress = $_[0]; my $dns_name = $_[1]; my $good_dns_name = undef;; eval{ local $SIG{ALRM} = sub { die "alarm\n" }; # NB: \n req +uired alarm 15; my $ping_thread = threads->new(\&vm_ping, $dns_name); print ("CMD: tcpdump -q -i eth0 'ether dst host $macad +dress and proto ICMP and src host $hostname' -c 1\n"); my $dumpdata = undef; my $dump_thread = threads->new(sub {my $ipaddress = un +def; $dumpdata = `tcpdump -q -i eth +0 'ether dst host $macaddress and proto ICMP and src host $hostname' +-c 1`; return $dumpdata; }); $dumpdata = $dump_thread->join(); print "\nTEST\n\n"; print "\n".$dumpdata."\n\n"; $good_dns_name = $ping_thread->join(); alarm 0; }; print "tcpdump done\n"; return $good_dns_name; } sub vm_ping { my $dns_name = $_[0]; my $good_dns_name; print("DNS: $dns_name\n"); eval{ local $SIG{ALRM} = sub {die "alarm\n" }; alarm 15; my $data = `ping $dns_name -c 10 -s 256`; if(!($data =~ m/Destination Host Unreachable/g)){ $data =~ m/PING (.*).et.byu.edu/g; $good_dns_name = $1; print($data."\n"); } alarm 0; }; print "\nDONE\n\n"; return $good_dns_name; }

I've tweaked and tweaked and now I'm looking for that good ol' perlmonk wisdom. Does my thread management look okay? The ping function does execute and finish after the tcpdump is started. Tcpdump just doesn't break out of the 15 second timeout I set up for it.

Replies are listed 'Best First'.
Re: Threads, bash, and networking
by zentara (Cardinal) on Oct 22, 2010 at 17:41 UTC
    Hi, see using the thread->kill() feature on linux where we end up talking about the finely tuned features of the new threads available. The reason I bring this up, is that the manpage for tcp says
    Tcpdump will, if not run with the -c flag, continue capturing packets +until it is interrupted by a SIGINT signal (generated, for example, b +y typing your interrupt character, typically control-C) or a SIGTERM +signal (typically generated with the kill(1) command); if run with th +e -c flag, it will capture packets until it is interrupted by a SIGIN +T or SIGTERM signal or the specified number of packets have been proc +essed.

    The newer threads probably now requires you to setup specific signal handlers in the thread running tcpdump, so that it can exit. Carefully read threads and the section on Signalling.... you probably need a

    $SIG{'FOO'} = sub { do_foo; };
    in your sub codeblock.

    Hopefully, whatever that tcpdumd is doing isn't considered IO of some sort. Because if it is, Signals will not work on threads stuck in IO, from what the perldoc says. Then you might have to fork tcpdump, so as to get it's pid, then kill it by $pid.

    I am out of votes for today, but I would have ++'d you for the well written description of your problem :-)


    I'm not really a human, but I play one on earth.
    Old Perl Programmer Haiku ................... flash japh
Re: Threads, bash, and networking
by VinsWorldcom (Prior) on Oct 22, 2010 at 16:08 UTC

    Certainly the renameing should be first and foremost - this is why standards exist and you're right to point that out to your superiors.

    Don't you have some VM management software - even the default stuff you get when you by your hypervisor software? As you take advantage of VM migration and load sharing, you're going to have a nightmare trying to manage this without good software. And there is so much out there already rather than rolling your own with threads and tcpdump.

      I'll remember all this when I get to run the show. Unfortunately, most of this place is held together by hacks if they can indeed be worthy enough to be called hacks. Probably "tweaks" would suffice. I'm trying really hard to look at this as a good programming and learning opportunity until I graduate. And yes... it is a nightmare.

Re: Threads, bash, and networking
by juster (Friar) on Oct 22, 2010 at 18:51 UTC

    You might try using an arp ping instead of your more complicated setup. The arping utility does this. arping sends pings over ethernet asking "which mac address owns this ip address". This would eliminate the complicated step of running tcpdump.

    Edit: arping would only work if your VMs are on the local network. This might be a silly assumption for me to make what with cloud computing and all. If your VMs aren't on the local network I would suggest using a piped open and select and maybe using Net::Ping instead of the ping command.

    Here is a test I created for my own curiosity. The summary of my findings are:

    • SIGALRM handlers inside threads are adorably useless. Put the SIGALRM handler in the main thread.
    • The parent thread's SIGALRM handler overrides the child thread SIGALRM handler... thankfully.
    • join-ing the thread delays the SIGALRM handler until the joined thread is finished. Not helpful.
    • If you detach the thread the SIGALRM is triggered with the proper delay.
    • If you set the delay with 'alarm' inside the thread, it will override the parent thread's 'alarm'.

    Rather confusing. So basically, if you insist on using threads detach the thread after you set the alarm handler and alarm delay in the parent thread. This is kind of silly because you could detach the thread and just sleep, then kill the thread or whatever.

    Lose the threads, lose tcpdump and ping, and use: `arping -c 1 $host`. Easy.

    #!/usr/bin/perl use warnings; use strict; use threads; sub set_alarm { $SIG{'ALRM'} = sub { printf "Alarm went off in thread #%d\n", threads->tid(); print qq{"Get up, get up!", says the clock.\n}; exit 1; }; alarm shift; } # If you set the SIGALRM handler HERE and ... # If you join the thread, the alarm handler goes off after 10 se +conds # ( The 'sleep' delay set inside the child thread. ) # If you detach the thread, the alarm handler goes off after 3 sec +onds # ( The 'alarm' delay set inside the thread ) # Or If you don't set 'alarm' inside the thread, the outer 'al +arm' # delay is used. set_alarm( 5 ); my $sleepy = threads->create ( sub { # If the SIGALRM handler is used in + this thread # (comment out the set_alarm above) # it simply prints "Alarm clock" ! +hahahaha set_alarm( 3 ); sleep 10; print "Well, well, are you joining +me in bed?\n"; } ); printf "Main thread # is %d.\nChild thread # is %d.\n", threads->tid(), $sleepy->tid(); # Try commenting out detach/sleep and uncommenting join # $sleepy->join; $sleepy->detach; sleep 15; # you only need to sleep when detach-ing print "*Fart*\n"; # Luckily this never gets reached... print "Oh crap I'm late for work!\n";

    Edit: To be clearer, SIGALRM handlers inside the child thread are apparently never reached. I also only ran this on Mac OS X, who knows you could get different results on different machines (Joy!).

      To be clearer, SIGALRM handlers inside the child thread are apparently never reached.

      By that, I assume you mean that alarms raised in one thread are not caught in other threads.

      Why would you expect that they would be? Signals don't cross fork boundaries, so why expect they might cross thread boundaries?

      Remember, there is no parent-child relationship between threads, so if an alarm raised in one thread could be caught in a thread it spawned, it would also be caught by every other thread in the program.

      There would simply be no way to reason about a system that meant that every thread received every signal raised in any other thread.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        By that, I assume you mean that alarms raised in one thread are not caught in other threads.

        I mean if you comment out the top set_alarm() statement in my code, it will print "Alarm clock" and exit after 3 seconds. From this I am inferring that the created thread is successfully setting the alarm for 3 seconds. $SIG{ALRM} is set in the same created thread. The problem is the value of $SIG{ALRM} is never used. Instead the magic "Alarm clock" appears and terminates the program!

        This was also the first bullet-point of my conclusions:

        SIGALRM handlers inside threads are adorably useless. Put the SIGALRM handler in the main thread.

        So again, that wasn't my origin message. But since you mentioned it... alarms raised in the created thread were indeed caught by the main thread. In fact, according to my test program anyways, this is the only way alarm signals used in threads are functional at all.

        I would avoid using alarm signals in threads altogether. After trial and error, tweaking the code I posted, I have come to understand several nuances of alarm signals and threading. That is what I was trying to share. Hopefully from all the unpredictable behavior I have mentioned, readers can conclude that using alarm inside threads is not worth the headache.

Re: Threads, bash, and networking
by BrowserUk (Patriarch) on Oct 23, 2010 at 04:19 UTC

    Preview: Way down the bottom of this long post I suggest a 4-line subroutine, as a replacement for all the code you posted, that does everything that I believe you need. And it uses no threads, no alarms and no signals. Oh, and it will work :) But, please don't skip ahead without reading the intervening material, because if you do, you'll simply dismiss it as too simple. You won't understand the analysis that makes me believe it is all you need.

    Does my thread management look okay?

    Ignoring for the moment whether your ping + tcpdump strategy is necessary, the simple answer to the above question is: no!

    Cutting through all the description and boiling down the code, what you have is this:

    threadn { eval { sighandler = {}; alarm 15; ## alarm#1 start threadn+1; start threadn+2; threadn+2->join threadn+1->join } alarm 0; } threadn+1{ eval{ sighandler = {}; alarm 15; ## alarm#2 @input = qx[ ping ]; } alarm 0; } threadn+2 { @input = qx[ tcpdump; return @input; }

    Some questions (in no particular order, but the semantic do vary depend upon ordering):

    • When alarm#1 goes off, what are you hoping to interrupt?
      • Just the qx// in threadn+2?

        If so, why would it only affect that, and not the qx// in threadn+1?

      • Just the threadn+2->join?

        If so, what would happen to threadn+2?

        When its qw// returns, who is going retrieve (join) its return value and allow it to release resources?

        Or are you anticipating that interrupting threadn+2->join is going to somehow make threadn+2 and the external process it started just "go away"?

      • Or are you also hoping that the signal raised in threadn is going to somehow permeate its way into both threadn+1 and threadn+2?

        If so, how is threadn+1's signal handler going to differentiate between an alarm signal raised locally and the one raised by its parent?

        And what happens to those two external processes when the code waiting on their output stops waiting?

      • What happens if threadn+2 finishes successfully before the alarm goes off?

        You then immediately go to waiting for threadn+1->join.

        But the alarm you set for threadn+2 is still running. What happens if it interrupts that join?

        Its not just possible, but almost inevitable as not only do you set the alarm for threadn lexically before you set the alarm in threadn+1. There is also no guarantee that threadn+1 will even get to run before the first alarm goes off. It is unlikely, but on a heavily laden system it is possible.

        And again, what happens to the ping process? And the output it produces? And the thread storing those results awaiting a join that just got interrupted?

    • What is the point of threadn+2 anyway?

      If the only thing you are going to do after starting a thread, is going into an immediate blocking join on that thread, you've effectively coded expensive sequential statements.

      Ie. You would be far better skipping the thread entirely and placing its code in-line, thereby avoiding the threads start-up costs, context switches and blocking waits.

    • Why give the ping process thread (threadn+1) its own signal handler and not the tcpdump thread?
    • If you're wondering why I've used threadn, threadn+1, threadn+2 instead of just thread0/thread1/thread2 above, here's the answer:

      I assume from your description, that once you have the ability to track a single VM/DNS (using 3 threads) that you intend to start monitoring multiple VM/DNS parings concurrently, by starting a "threadn/n+1/n+1" for each such pairing?

      Assuming I'm correct, you then need to consider the semantics of having lots of threads raising lots of alarms intended to interrupt other threads all running and going off concurrently. I hope you can see that all of the above about what threads are going to be affected by what signals; along with what happens to all the threads and processes that get interrupted by them; suddenly compounds out of control.

    What I'm trying to point out is that they way you are going about this is very confused. The question of what the semantic of what you are doing should be is really irrelevant, because there is no single, signals and threads semantics, that could possibly make what you are doing, work in a sensible fashion.

    Some people will read the above and use it as vindication that "threads are evil"; completely missing the point that you'd have exactly the same set of issues if you are using fork instead of thread->create().

    Signals + threads semantics

    What actually happens (assuming you're using relatively recent versions of Perl & threads; and upgrade now if you're not!), is pretty much defined by these two paragraphs from the relevant section of the POD:

    CAVEAT: The thread signalling capability provided by this module does not actually send signals via the OS. It emulates signals at the Perl-level such that signal handlers are called in the appropriate thread. Correspondingly, sending a signal to a thread does not disrupt the operation the thread is currently working on: The signal will be acted upon after the current operation has completed. For instance, if the thread is stuck on an I/O call, sending it a signal will not cause the I/O call to be interrupted such that the signal is acted up immediately.

    In essence, the simple rule is that signals do not cross thread boundaries--just as they do not cross fork bounadaries. Apparently they did in early versions of threads, but that was a bug long since fixed. A mechanism has been added to threads that uses the signals nomenclature for a very limited form of "inter-thread" signalling, but as it won't actually interupt anything, it is effectively useless.

    How to fix your current code

    So then we come to the point of how to do what you are trying to do. (Still skipping over the issue of whether ping + tcpdump is necessary or desirable>).

    You could just move the alarm + signal handler from threadn into threadn+2 (mirroring what you've done in threadn+1). This would probably work, but as I pointed out above there is no point of starting another thread if all the parent is going to do is immediately go into a blocking wait for it to finish.

    So that suggests skipping thread2 entirely and moving its code back into threadn. That will probably work, but you must ensure that you cancel the alarm before going into the join on threadn+1. Because if you don't you'll not only create a zombie thread (and possibly a zombie process), but you also lose the output that ping produces.

    Unless of course the output from ping is not important to what you are doing? In which case, why are you using backticks to gather it?

    But then I grind to a halt, because for the life of me I don't see why you are calling tcpdump in the first place? The only thing you do with its output, is print it to the screen. You never inspect it to make any decisions based upon its content. And you never pass it back to threadn's caller either. So what's its purpose?

    And then I try to look at what decisions are making or data you are actually gathering. The only thing I see is that if the ping is successful, you extract $good_dns_name from the output of ping and (indirectly) return that to threadn's caller.

    That again raises the question: why are you calling tcpdump at all?

    The final rub is, seeing how you never do anything useful with the output of tcpdump if you drop that entirely, you can avoid not just threadn+2, but threadn+1 also. On top of that, you can drop all the alarms and signalling also.

    By combining -c count and -i interval, you can not only: ensure that ping only runs for a set period of time, thereby self terminating avoiding the possibility of a zombie process, and the need for threadn+1 and the need for alarm and signals; you can also allow your discovery processes to short-circuit by discovering the good_dns_name immediately after a successful ping.

    In other words, all the code you posted can be reduced to (something like):

    sub ping { my $dns_name = shift; open my $cmd, "ping -c 15 -i 1 $dns_name" or die; m/PING (.*).et.byu.edu/ and return $1 while <$cmd>; return; }

    Now, I don't really know what tcpdump does. I can hazard a guess based on the name, but on the evidence of the code you've posted, whatever it does, you are not using it for anything useful. If it really is required for your solution, then explain why and I'll incorporate it into a solution. Which I guarantee will be a whole lot simpler than what you are doing currently. Better yet, it'll work :)


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      First off, my apologies for the long overdue reply. Various other things have demanded my attention, and I wanted to have the time to make sure I had given it a good honest effort before I reported back. The story is a success by the way.

      @zentara – Thanks for the reference to the thread->kill feature. It was most helpful. I ended up going with your suggestion of killing tcpdump by its pid because I couldn't seem to end the thread without tcpdump first ending (I/O operation).

      @juster & BrowserUK– I now see I have a SIGALRM fail on my part. The clarification was helpful. Arping unfotunately won't work for what I've been assigned to accomplish. Arping works on the subnet and for its part work well. Two problems I see using it (if my understanding is correct), is it relies on a cache which means for some reason or other, I won't always have valid information (especially since DNS management is also bad here), but also arping doesn't reach across routers. If the DNS does not apply to anything locally I still need to know if it being used elsewhere so I can don't mark it as unused.

      @BrowserUK – I read everything you wrote and found A LOT of mistakes in what I coded up. Trying to answer your questions made me realize that some of my code was more wishful thinking than an actual solution. This was also evident in your use of the word “hope” in most of those questions.


      Now that I almost have my head screwed on straight, let me re-clarify the problem, the reason tcpdump needs to be part of the solution, and what my solution is (although it needs some tuning still).

      Problem: We need to know the location of various Virtual Machines in case they go down. Unfortunately there has been no solid naming convention and so there is a lot of confusion as to where a given virtual machine might be. Instead of taking the time to fix the naming convention I've been assigned the less efficient although quite educational task of writing a script to determine what virtual machines are running on each host.

      Soltion: After doing some voodoo to determine possible DNS names of the Virtual Machines located on a host, I try and ping them. A ping alone will not give me my answer as the DNS name may be associated with some other server located elsewhere on the network. When I use tcpdump I can listen on the hosts eth0 interface for incoming packets. All packets going to the virtual machines are routed through eth0. So by starting an instance of tcpdump and sending some pings through, I can be sure that that DNS is associated with a virtual machine on that host. Then I can update the location (host) of that virtual machine in the database.

      Code (As you can see it's far more simplified and makes much more sense. At least to me. And it has been working.):
      ... my $dump_thread = threads->new(\&tcp_dump, $macaddress, $dns_name); sleep 1; my $ping_thread = threads->new(\&ping, $dns_name); ... $ping_thread->join(); ... // this section checks if tcpdump captured a packet and does something + with it ... my $dump_pid = `pidof tcpdump`; `kill $dump_pid`; $dump_thread->join(); ...

      Notes: I had to kill the tcpdump process in order to end the thread because it's considered an I/O operation (particulars for this have already been described above). After I completed that I was free to join the thread.

      No the thread for my ping operation isn't necessary. I set a timeout as well as a 2 ping count so no matter what it will end. I went back to fix it, but I posted it here, so maybe it'll serve some benefit to someone else.

      Thank you all for all of your replies and references. They were very helpful and I picked up a lot and now know at least a small bit of PERL kung fu.