NetApp storage administrators tend to be a little lazy. Perhaps understandably so---NetApps are really good at "fire and forget" storage; you simply set up a volume and let WAFL handle the rest, versus sitting down and developing a working strategy up front that maximizes the filer's potential.

Unfortunately, as a result, a goodly number of otherwise good SAN admins only start to pay attention when performance begins to go south....And when it goes south, it goes south in a hurry.

This script looks to prevent the going-south part, by examining everything from the environmentals on the filer to volume and aggregate space utilization. It relies on some general rules of thumb when it comes to its suggestions, such as preserving at least 15-20% in the volume to prevent heavy increases in fragmentation under the hood.

The only added part you need to get FilerProbe working for you is the MIB file for your particular filer, since it pulls information via SNMP. This is available via NetApp and other sources on the web. Place it in the same directory as this script, and you're ready to fly.


When not sailing in his 15 foot beechwood schooner, FilerProbe enjoys long walk on the beach, fine wines, and listening to his extensive collection of Foreigner bootlegs. "Even though i'm a Perl script, I like to think i'm a trend-bucker", says FilerProbe. "I'm a Gemini -- I blaze my own path, and any foxy lady I might meet will enjoy that part of me, I think."

To meet FilerProbe, press 3 at the tone.




#!/usr/bin/perl ## FilerProbe written (061909:0914) by Bowie J. Poag ## FilerProbe keeps tabs on a NetApp, warns of error states and attemp +ts to predict problems with storage utilization. ## ## Usage: ## ## filerprobe.pl <hostname> ## ## Internal: ## ## snmpget -v1 -c public -m ./NETWORK-APPLIANCE-MIB.txt tmcnetapp3 NET +WORK-APPLIANCE-MIB::productModel.0 ## ## chomp($date=`date`); $fetchCommand="snmpget"; $tableFetchCommand="snmptable"; $SNMPVersion=1; $communityString="public"; $MIBFilename="./NETWORK-APPLIANCE-MIB.txt"; $MIB="NETWORK-APPLIANCE-MIB"; $filerName=$ARGV[0]; spinUp(); analyzeFilerHealth(); collectVolumeMetrics(); reportDump(); spinDown(); sub spinUp { print "\n"; print "FilerProbe: Spinning up..\n"; ## Ho hum.. } sub analyzeFilerHealth { print "FilerProbe: Collecting filer status data..\n"; @fetchList= ( productModel, # productVersion, # cpuUpTime, # miscGlobalStatusMessage, # diskTotalCount, # diskActiveCount, # diskReconstructingCount, # diskReconstructingParityCount, # diskFailedCount, # envFailedFanCount, # envFailedPowerSupplyCount, # nvramBatteryStatus # ); foreach $item (@fetchList) { chomp($item=`$fetchCommand -v$SNMPVersion -O qv -c $co +mmunityString -m $MIBFilename $filerName $MIB\:\:$item.0\n`); $item=~s/\"//g; $item=~s/\n//g; } @version=split(":",$fetchList[1]); @uptime=split(":",$fetchList[2]); print "FilerProbe: Filer $filerName ($version[0]) has been up +for $uptime[0] days, $uptime[1] hours, $uptime[2] minutes, $uptime[3] + seconds.\n"; if ($fetchList[8]>0) { push(@statusReport,"There are currently $fetchList[8] +spindles marked as dead on this filer. They need to be replaced.\n"); } if ($fetchList[6]>0) { push(@statusReport,"FilerProbe: One or more spindles a +re running degraded.. $fetchList[6] volume reconstructions and $fetch +List[7] parity reconstructions are currently active.\n"); } if ($fetchList[9]>0) { push(@statusReport,"One or more cooling fans have fail +ed. This is potentially bad.\n"); } if ($fetchList[10]>0) { push(@statusReport,"One or more power supplies have fa +iled. This is very bad.\n"); } if ($fetchList[11]!~/ok/) { push(@statusReport,"Something's wrong with NVRAM batte +ry. (Status is $fetchList[11])\n"); } print "FilerProbe: Current global status message is: $fetchLis +t[3]\n"; } sub collectVolumeMetrics { print "FilerProbe: Collecting volume status data..\n"; print "FilerProbe: \n"; @volumes=`$tableFetchCommand -v1 -c $communityString -C Hf \: +-m $MIBFilename $filerName $MIB\:\:dfTable\n`; foreach $item (@volumes) { @temp=split(":",$item); $temp[1]=~s/\"//g; graphIt(); $temp[1]=~s/\/$//g; if ($temp[5] >=85) { $badAreas{$temp[1]}=$temp[5]; } } print sort @graphs; } sub graphIt { $volName=$temp[1]; $percentUsed=$temp[5]; $barLength=($percentUsed*50)/100; $starString=""; $barGraph="["; $x=0; while ($x<25) { if ($x<$barLength/2) { $barGraph.="#"; } else { $barGraph.=" "; } $x++; } $barGraph.="] $percentUsed%"; push (@graphs,sprintf "FilerProbe: %-28.28s %s\n",$volName,$ba +rGraph); } sub reportDump { print "FilerProbe:\n"; print "FilerProbe: Report date: $date\n"; print "FilerProbe:\n"; print "FilerProbe: Filer Status: \n"; print "FilerProbe:\n"; if ($#statusReport==-1) { print "FilerProbe: This filer appears to be in good sh +ape health-wise. No hardware warnings found.\n"; } else { print "FilerProbe: There appears to be one or more thi +ngs wrong with the filer.\n"; print @statusReport; } print "FilerProbe:\n"; print "FilerProbe: Suggested actions:\n"; print "FilerProbe:\n"; while (($volName,$percentUsed)=each %badAreas) { if ($percentUsed>100) { $increment=($percentUsed-100)+20; push(@actions,"FilerProbe: Snapshot region $vo +lName is eating into it's parent volume. Not good.\n"); } else { $increment=20-(100-$percentUsed); } if ($volName=~/vol/ && $volName!~/snap/) { push(@actions,"FilerProbe: Volume $volName sho +uld be expanded by at least $increment%, and defragged.\n"); } if ($volName=~/aggr/i) { push(@actions,"FilerProbe: Aggregate $volName +needs more spindles added to it to allow for volume growth.\n"); } if ($volName=~/snap/ || ($volName=~/snap/i && $volName +=~/aggr/i)) { push(@actions,"FilerProbe: Snapshot region $vo +lName should be increased by at least $increment%.\n"); } } print sort(@actions); } sub spinDown { print "FilerProbe: \n"; print "FilerProbe: Exiting..\n\n"; }

(Unfortunately, my employer's security policy prevents me from pasting a detailed copy of FilerProbe's rather nicely-formatted output.)

Replies are listed 'Best First'.
Re: FilerProbe: Tidy Up Your NetApp
by jwkrahn (Abbot) on Jun 24, 2009 at 17:25 UTC

    You are using subroutines incorrectly.   You are not passing any data to the subroutines.   You are not returning any data from the subroutines.   You do not have any data that is local to the subroutines, all the data is global.   And you only really need subroutines for code that will be called more than once but all your sbroutines are only called one time.

    You should include the warnings and strict pragmas so that perl can help you find mistakes, for example:

    @fetchList= ( productModel, # productVersion, # cpuUpTime, # miscGlobalStatusMessage, # diskTotalCount, # diskActiveCount, # diskReconstructingCount, # diskReconstructingParityCount, # diskFailedCount, # envFailedFanCount, # envFailedPowerSupplyCount, # nvramBatteryStatus # );

    should probably be like this instead:

    @fetchList= qw( productModel productVersion cpuUpTime miscGlobalStatusMessage diskTotalCount diskActiveCount diskReconstructingCount diskReconstructingParityCount diskFailedCount envFailedFanCount envFailedPowerSupplyCount nvramBatteryStatus );


    $temp[1]=~s/\/$//g;

    You are using the  $ anchor which says to only match at the end of the string and you are using the  /g option which says to match everywhere in the string.   You can use one or the other but not both.

      Hey jw! ....I would disagree with your statement that I am using subroutines "incorrectly". I often use subroutines as a way of keeping different blocks of code separated logically. Which is why I'm not passing data to subroutines. Which is why I'm not returning any data from subroutines. Which is why I don't have any data local to the subroutines. Which is why I have everything of any consequence they might operate on declared globally up front. :)

      I usually begin every program I write with a bunch of plain english statements that describe what I want the code to do. These statements end up being boiled down into subroutine names, just like what you see here. I've found it keeps code complexity to a minimum, makes debugging easier, and even prevents bloat/function creep in a way. It's a single-purpose, flat procedural script broken down into blocks for my own and others' readability, and just fleshed out as I go.. You'll notice that not only are most of the subroutines only called once, and a couple of the subroutines do next to nothing at all.

      As for the $temp[1]=~s/\/$//g; statement you pointed out, you are correct -- applying the regex globally is superfluous. s/\/$//g is functionally identical to s/\/$//. I probably had it match for something else earlier, changed it, and forgot to remove the g. :)

        Congratulations - You're 10% of the way toward structured programming. Hold on to your hat - In the next decade, a new paradigm called "object-oriented programming" will sweep the world.

        jwkrahn is absolutely spot on in all of his criticisms.

        For a program such as this, which is a simple stand-alone script which has one thing to do and does it just once, it doesn't matter too much. It may get your job done, but it's not the kind of code I'd be waving in front of my boss.

      I'm going to have to disagree on "qw" becuase it doesn't support comments. If it treated a line that begins with a pound sign as a comment then it would be really useful, but since it doesn't, it's just too annoying when I want to comment out some entries. I suppose if you had a static list that you knew you were never going to change then it would be fine, but as a systems engineer I don't run into that situation very often (except for really short single line lists).

      Elda Taluta; Sarks Sark; Ark Arks

Re: FilerProbe: Tidy Up Your NetApp
by ZlR (Chaplain) on Jul 01, 2009 at 12:25 UTC
    Wouldn't it be better to process the autosupport file rather than sending live probes to a presumely already loaded system ?

    Just a though ;-)