HA node check

On AIX HA clusters (and probably other clusters also) when a node fails, the other node (in a two node cluster) take over the failed node's hostname, IP, etc. If you have a cron job, for example, that needs to run on a particular node, regardless of its state, the easiest way that I have found is to do a 'netstat -i' and grep for the node name. For example, I have a script I want to run every hour on node1. I make a crontab entry on both (or all) nodes in the cluster to run this script every hour. Node2 will start to run the script, encounter the HA check and exit. If node1 fails, node2 will take-over node1's name and IP, therefore the HA check will suceed because it now has node1's name and IP also.

sub HA_Check {
        @NETSTAT = split(' ', `netstat -i | grep node1`, 9999);
        if ($NETSTAT[3] !~ /node1/) {
                die;
        }
}
[download]

Comment on HA node check Download Code

Replies are listed 'Best First'.
Re: HA node check by Beatnik (Parson) on Dec 12, 2001 at 14:40 UTC
if (((split(' ',`netstat -i \| grep node1`))[3]) !~ /node1/) { die } might be a somewhat shorter. ~~This, ofcourse, is puzzling me since you're grepping for lines with node1 and then are matching/dieing on lines that don't contain it.~~ Update: Changed that `split` according to blakem's node below. Greetz Beatnik ... Quidquid perl dictum sit, altum viditur.	[reply] [d/l] [select]
Re: Re: HA node check by blakem (Monsignor) on Dec 13, 2001 at 00:37 UTC
Splitting on `/ /` is different from splitting on `' '`. I believe that change would break the original program.... From split: As a special case, specifying a PATTERN of space (' ') will split on white space just as `split' with no arguments does. Thus, `split(' ')' can be used to emulate awk's default behavior, whereas `split(/ /)' will give you as many null initial fields as there are leading spaces. A `split' on `/\s+/' is like a `split(' ')' except that any leading whitespace produces a null first field. A `split' with no arguments really does a `split(' ', $_)' internally. Here is an example: `#!/usr/bin/perl -wT use strict; $_ = " a b c "; print "split(): $_\n" for split(); print "split(' '): $_\n" for split(' '); print "split(/ /): $_\n" for split(/ /); __END__ =head1 OUTPUT split(): a split(): b split(): c split(' '): a split(' '): b split(' '): c split(/ /): split(/ /): split(/ /): split(/ /): a split(/ /): split(/ /): split(/ /): b split(/ /): split(/ /): split(/ /): c` [download] -Blake	[reply] [d/l] [select]
Re: Re: HA node check by coec (Chaplain) on Dec 12, 2001 at 19:48 UTC
Thats the whole point :) The netstat -i on node1 will return something like `Name Mtu Network Address Ipkts Ierrs Opkts Oerrs + Coll en0 1500 link#2 0.4.ac.3e.65.22 169651762 0 150335201 + 0 0 en0 1500 172.22.6 node1 169651762 0 150335201 + 0 0 en2 1500 link#3 0.4.ac.3e.15.54 1387003 0 1389752 +0 0 en2 1500 172.22.7 node1hb 1387003 0 1389752 0 + 0 en1 9000 link#4 0.4.ac.7c.95.ec 232558546 0 76198992 + 0 0 en1 9000 172.22.5 node1r 232558546 0 76198992 0 + 0 <loop back interfaces snipped>` [download] and a netstat -i on node2 will return something like `Name Mtu Network Address Ipkts Ierrs Opkts Oerrs + Coll en0 1500 link#2 0.4.ac.3e.65.4a 36127938 0 37211569 +0 0 en0 1500 172.22.6 node2 36127938 0 37211569 +0 0 en2 1500 link#3 0.4.ac.3e.65.c6 1412752 0 1364210 +0 0 en2 1500 172.22.7 node2hb 1412752 0 1364210 0 + 0 en1 9000 link#4 0.4.ac.7c.97.2f 76199094 0 232558588 +11 0 en1 9000 172.22.5 node2r 76199094 0 232558588 11 + 0 <loop back interfaces snipped>` [download] Where node? is the primary interface, node?hb is the heartbeat link and node?r is the redundant link that node2 will use to pretend to be node1 in the event of node1 failing, in which case interface en1 will no longer have a name of node2r but node1. Sooo, if node1 is up the test will fail on node2 (which is what we want) but if node1 is dead the test will succeed which, again, is what we want. I hope that makes more sense now.	[reply] [d/l] [select]