Failboat is an in-house tool we've written to keep track of how we're doing with our backups. It generates a nice HTML-formatted email showing us which systems are having ongoing problems, which systems aren't being covered, and provides us with reasons for why the jobs themselves are failing.

Failboat takes two arguments, a NetBackup server to query against, and an email address to dump the results to. The box which runs Failboat must be designated as a valid peer in NetBackup, else the query commands that Failboat relies upon won't work. In other words, this script should work when run on any NetBackup master server, media server, or client with authorization to run admin commands, but nowhere else.

(By the way, the first few lines of code have an HTML reference to a server which houses a nice logo that gets displayed at the top of the email. Feel free to change this to:

http://home.comcast.net/~bpoag/failboat-logo3.jpg

Code:

#!/usr/bin/perl ## ## Failboat 0.1 written 020910 by Bowie J. Poag ## ## Failboat tracks which NetBackup jobs have failed every night, and w +arns if a particular client ## had had problems for more than 2 days running. ## ## Failboat is a Negative Nancy. It has nothing good to say about anyt +hing, or anybody. Infact, ## it attempts to point out lapses in work ethic on the part of the ba +ckup administrator. But, ## that's it's job. ## ## Being so negative all the time has really taken a toll on Failboat. + He's chronically depressed. ## I found out a week or so ago that Failboat has been seeing a psychi +atrist for his problems.. ## It doesn't help much that Failboat is going through a nasty divorce + and custody battle on ## top of it all. Talk about stress.. That bitch put him through as ba +nkruptcy a few years ago, ## and now he can't get a loan to fix up the house in order to sell it + before she gets her hands ## on it once the divorce is finalized. It's just ridiculous. Poor Fai +lboat. Don't even get me ## started on the kids.. His alimony is going to be through the roof. ## ## Regardless, attitude is everything in this business, and if you hav +e a crappy attitude, it i ## just makes the work harder. I gotta give 'ol Failboat some credit f +or trying, tho. As if having ## to work with NetBackup wasn't soul-crushing enough..... ## use Date::Manip; use Mail::Sendmail; $DEBUG=0; $now=&ParseDate("today"); $mailRecipients=$ARGV[1]; $masterServer=$ARGV[0]; spinUp(); collectJobs(); parseJobs(); spinDown(); sub spinUp() { push(@console, "<html><body bgcolor=#000000><font size=2 face= +\"verdana\" color=#31bbad><img src=\"http://delphi/failboat-logo3.jpg +\" align=left><br /><br/><br /><br><br>"); push(@console, "<br>Report generated ".`date`."<br><br>"); print "\nFailboat: Spinning up..\nFailboat:\n"; -e ("/usr/openv/netbackup/bin/admincmd/bpdbjobs") or die "Fail +boat: Cannot find bpdbjobs binary. Nothing to do.\n\n"; } sub collectJobs() { print "Failboat: Collecting data from NetBackup master server +$masterServer..\n"; push(@console,"Failboat: Collecting data from NetBackup master + server $masterServer..<br>"); @tempJobsTable=`/usr/openv/netbackup/bin/admincmd/bpdbjobs -M +$masterServer | grep -v Catalog`; } sub parseJobs() { print "Failboat:\nFailboat: The following is a list of clients + that are currently experiencing problems with their backups:\nFailbo +at:\n"; push(@console, "Failboat:<br>Failboat: The following is a list + of clients that are currently experiencing problems with their backu +ps:<br>Failboat:<br>"); foreach $item (@tempJobsTable) { $item=~s/\s+/ /g; @thisJob=split(" ",$item); if ($thisJob[3]>1 && $thisJob[0]>1) { $DEBUG && print "Failboat: $item\n"; $DEBUG && push(@console,"Failboat: $item<br>") +; $failedClients{$thisJob[6]}++; if ($failureType{$thisJob[6]}=="") { $failureType{$thisJob[6]}=$thisJob[3]; } } } $DEBUG && print "\n"; while (($client, $failCode)=each(%failureType)) { $DEBUG && print "Failboat: Client $client most recentl +y failed with error code $failCode.\n"; $DEBUG && push(@console, "Failboat: Client $client mos +t recently failed with error code $failCode.<br>"); } $DEBUG && print "\n"; while (($client, $failCount)=each(%failedClients)) { $DEBUG && print "Failboat: Client $client has failed $ +failCount times in recent history.\n"; $DEBUG && push(@console, "Failboat: Client $client has + failed $failCount times in recent history.<br>"); } while (($client, $failCount)=each(%failedClients)) { if ($failCount>1) { $lastValidBackupTime=`/usr/openv/netbackup/bin +/admincmd/bpcatlist -client $client 2>&1 | grep $client | head -n1 | +awk '{print $2}'`; $lastValidBackupTime=~s/\s+/ /g; @temp=split(" ",$lastValidBackupTime); $lastValidBackupTime="$temp[1] $temp[2] $temp[ +3] $temp[4]"; $errorExplanation="with an unrecognized error +code ($failureType{$client})"; if ($failureType{$client}==21) { $errorExplana +tion="because a socket could not be opened"; } if ($failureType{$client}==40) { $errorExplana +tion="because the network connection was broken"; } if ($failureType{$client}==41) { $errorExplana +tion="because the network connection timed out"; } if ($failureType{$client}==50) { $errorExplana +tion="because the client backup process aborted"; } if ($failureType{$client}==58) { $errorExplana +tion="because the client was unresponsive"; } if ($failureType{$client}==59) { $errorExplana +tion="because access to the client wasn't allowed"; } if ($failureType{$client}==63) { $errorExplana +tion="because the backup process was killed client-side"; } if ($failureType{$client}==71) { $errorExplana +tion="because none of the specified files were found"; } if ($failureType{$client}==84) { $errorExplana +tion="because there was a write error on the tape"; } if ($failureType{$client}==90) { $errorExplana +tion="because media manager didn't recieve any data"; } if ($failureType{$client}==98) { $errorExplana +tion="because there was a problem with loading the tape"; } if ($failureType{$client}==150) { $errorExplan +ation="because the job was manually cancelled by the backup admin"; } if ($failureType{$client}==156) { $errorExplan +ation="because there was a snapshot error on the client"; } if ($failureType{$client}==196) { $errorExplan +ation="because the job wasn't able to start on time"; } if($lastValidBackupTime=~/\d/) { $delta=&DateCalc($lastValidBackupTime, +$now,\$err); @time=split(":",$delta); $age=$time[3]+($time[2]*7); $hoursAgo=$time[4]+($time[5]/60); $hoursAgo=$time[4]+($time[5]/60); $hoursAgo=int($hoursAgo+.5); if ($age>=2) { print "Failboat: Client $clien +t hasn't had a good backup since $lastValidBackupTime, $age days ago. +\n"; push (@console, "Failboat: Cli +ent <font color=#81f3ed>$client</font> hasn't had a good backup since + $lastValidBackupTime, $age days ago.<br>"); push (@reasons, "Failboat: The + last backup attempt on $client failed $errorExplanation."); } else { print "Failboat: Client $clien +t has been failing occasionally, but had a successful backup about $h +oursAgo hours ago.\n"; push (@console, "Failboat: Cli +ent <font color=#81f3ed>$client</font> has been failing occasionally, + but had a successful backup about $hoursAgo hours ago.<br>"); push (@reasons, "Failboat: The + last backup attempt on $client failed $errorExplanation."); } } else { print "Failboat: Client $client doesn' +t have any valid backup images whatsoever. This is bad.\n"; push (@console, "Failboat: Client <fon +t color=#81f3ed>$client</font> doesn't have any valid backup images w +hatsoever. This is bad.<br>"); push (@reasons, "Failboat: The last ba +ckup attempt on $client failed $errorExplanation."); } } } print "Failboat: \n"; push(@console,"Failboat: <br>"); push(@console,"Failboat: Reasons for the failures:<br>"); push(@console,"Failboat: <br>"); foreach $item (@reasons) { print "$item\n"; $item=~s/attempt on /attempt on <font color=#81f3ed>/; $item=~s/failed/<\/font> failed/; $item=$item."<br>"; push (@console, $item); } } sub spinDown() { chomp($dateStamp=`date`); $subjectLine="Failboat report for $dateStamp"; print "Failboat:\nFailboat: Scan completed at $dateStamp. Spin +ning down..\n"; push(@console,"Failboat:<br>Failboat: Scan completed at $dateS +tamp. Spinning down..<br>"); push(@console, "<br>End of report.<br><br><br><br><br><font co +lor=#0d2522 size=1>Failboat v1.04 written 022410:1133 by Bowie J. Poa +g </font></body></html>"); print "Failboat: Sending report to $mailRecipients..\n\n"; $mail{'SMTP'} = 'mail.tmcaz.com'; $mail{'FROM'} = 'Failboat <sysmon@foobar.com>'; $mail{'TO'} = $mailRecipients; $mail{'SUBJECT'} = $subjectLine; $mail{'CONTENT-TYPE'} = 'text/html; charset="us-ascii"'; $mail{'MESSAGE'} = join("",@console); (sendmail %mail) || print "Send failed: $Mail::Sendmail::error +<br>"; }

  • Comment on Failboat -- An Emotionally Disturbed Tool For Checking NetBackup Client Coverage
  • Download Code

Replies are listed 'Best First'.
Re: Failboat -- An Emotionally Disturbed Tool For Checking NetBackup Client Coverage
by jwkrahn (Abbot) on Mar 19, 2010 at 02:51 UTC
    spinUp(); collectJobs(); parseJobs(); spinDown(); sub spinUp() {

    Why the subroutines?    All the variables are global and nothing is passed to them so it's not to protect local variables.    They are only called once at the beginning of the script so it's not to reuse code.    What is the point?


    foreach $item (@tempJobsTable) { $item=~s/\s+/ /g; @thisJob=split(" ",$item);

    If the first argument to split is a string containing a single space character then split removes all whitespace characters so your use of a substitution to remove all whitespace characters first is redundant.    You can achieve exactly the same result like this:

    foreach (@tempJobsTable) { @thisJob=split;

    if ($failureType{$thisJob[6]}=="") { $failureType{$thisJob[6]}=$thisJob[3]; }

    You are using a numerical comparison operator on a string so perl will conveniently convert that string to the number 0 to perform the comparison.    Perhaps you meant to use the eq string comparison operator instead?    Or the exists or defined funtions?


    $lastValidBackupTime=`/usr/openv/netbackup/bin +/admincmd/bpcatlist -client $client 2>&1 | grep $client | head -n1 | +awk '{print $2}'`;

    You are getting a single field from a single line from an externally executed command.    Because AWK splits its fields on whitespace the only whitespace in the returned string will be the newline that the backquotes add.

    $lastValidBackupTime=~s/\s+/ /g;

    You are converting the newline at the end of the string to a single space character.

    @temp=split(" ",$lastValidBackupTime);

    You are assigning to $temp[0] the contents of $lastValidBackupTime.

    $lastValidBackupTime="$temp[1] $temp[2] $temp[ +3] $temp[4]";

    You are assigning to $lastValidBackupTime the string "   ".

    Why did you do all that just to assign three spaces to $lastValidBackupTime?

      What is the point?

      Separation (organization).

      Hey, super reply, jwkrahn. Thanks for the critique--seriously. I like having the opportunity to improve my style a bit, which is why I come here.

      In answer to your questions:

      1) I break out my code with subroutines simply for the sake of readability and organization. When I sit down to code, I write a simple block of steps in plain English as to what i'd like to do. From there, I build out the steps in code. If it turns out that over the course of hashing the code out, one of those routines needs to be called repeatedly, then i'm already there. :)

      2) My understanding is that if I were to supply a single whitespace in the split call WITHOUT first boiling out the instances of >1 consecutive whitespaces, I may end up with an array filled with things like " foo" and "bar ", not "foo" and "bar". My goal there was to have an array filled with nothing but values of importance, not interspersed with values of no importance (whitespaces).

      3) You are correct, I am doing a numerical compare on a string, and an exists call would suffice. This is just shorthand on my part. This way, I don't need to distinguish between numerical and non-numerical comparrisons in my code. Helps third-party readability too, I suppose.

      4) I had derived "$lastValidBackupTime" using a different method earlier. The regex to deflate the string is a holdover from that method, and superfluous.

      5) My goal there (if I remember correctly) was to break out a string containing a timestamp to something I could format neatly at will. See #3 above.

      Thanks again for the writeup!
        I break out my code with subroutines simply for the sake of readability and organization.

        That is great for Perl4 code but you should really start using the warnings and strict pragmas in your code.

        2) My understanding is

        split has an exeption to the rule that the first argument is always interpreted as a regular expression and that exeption is a string with a single space character.    It is almost like split( /\s+/ ) except that any leading whitespace is ignored.    The exception is there so that Perl can imitate AWK with the -a switch.

Re: Failboat -- An Emotionally Disturbed Tool For Checking NetBackup Client Coverage
by jffry (Hermit) on Mar 19, 2010 at 15:03 UTC

    Thanks for contributing. I mean that earnestly.

    Please forgive my one small Unix gripe, however. There is no need to pipe grep output to awk. awk will do it all.

    me@mybox:~/sandbox $ cat data.txt bob 21 xyz sam 8 uuu bob 90 fff sue 12 qaz bob 99 har me@mybox:~/sandbox $ grep bob data.txt | head -1 | awk '{print $2}' 21 me@mybox:~/sandbox $ awk '/bob/ {print $2; exit}' data.txt 21

    I was referring to this line of yours:

    $lastValidBackupTime=`/usr/openv/netbackup/bin/admincmd/bpcatlist -cli +ent $client 2>&1 | grep $client | head -n1 | awk '{print $2}'`;

      That is your problem with it, Really?

      How about using Perl to do that, since we are doing this in a Perl program:

      open my $PIPE, '-|', "/usr/openv/netbackup/bin/admincmd/bpcatlist -cli +ent $client 2>&1" or die "Cannot open pipe from 'bpcatlist' $!"; while ( <$PIPE> ) { next unless /$client/; $lastValidBackupTime = ( split )[ 1 ]; last; } close $PIPE or warn $! ? "Error closing 'bpcatlist' pipe: $!" : "Exit status $? from 'bpcatlist'";

        jwkrahn wrote:

        >That is your problem with it, Really?

        Yes. I did not go through the trouble to respond for the sake of sarcasm. It was a sincere response.

        Besides, I suspected someone more clever and skilled at Perl such as you would respond with appropriate Perl syntax, and I was still able to spread the message of usless use of grep.

        So it looks like a win-win situation to me.

      Oh, I know. That backticked statement was just off the top of my head. I'm aware that awk is perfectly capable of doing what i've described ala piping. Sadly, i've never used awk enough to know how to use it properly in this context.
Re: Failboat -- An Emotionally Disturbed Tool For Checking NetBackup Client Coverage
by rementis (Beadle) on May 07, 2010 at 01:06 UTC
    Very cool script, I'm going to try it out right now!