grinder has asked for the wisdom of the Perl Monks concerning the following question:
I've been using MRTG for a while now, to watch Cisco routers (even if they do have really buggy snmp support), HP switches and sundry servers from a number of manufacturers. I'm currently setting up a VPN using Netscreen routers and ADSL lines, and I wanted to monitor them. As soon as I plugged them in, I started getting really weird non-reproducible errors. Sometimes the router would respond, sometimes not, and no way to understand why.
After futzing around with snmpwalk and some other support tools in the NetSaint kit, I felt pretty sure that the Netscreen routers were working correctly, it was the MRTG code that was wrong.
Once I started looking there, it was pretty quick to zoom into the offending code. In a nutshell, with SNMP when you want to ask a piece of equipment for some information, you have to tell it what community you belong to (read: authentication), what information you want, and a request id, which I surmise is kind of like a cookie.
By instrumenting the code with print statements, it became apparent that the following conditional was at fault. It would be true or false at random:
if ($response_community ne $this->{community} || ($response_id) != $this->{request_id})) { # ... }
... and that determines success or failure in receiving a response. So I printed out the values of these variables and hit the jackpot:
print "$response_community $this->{community} $response_id $this->{req +uest_id}\n"; # produces snmpfoo snmpfoo 2325544852 -1969422444
Looks like 32-bit signedness rearing its ugly head. If the MSB is zero, the test passed, if one, it failed. I had to get a job done, so I just packed the values and string compared them:
if ($response_community ne $this->{community} || pack('l',$response_id) ne pack('l',$this->{request_id})) { # ... }
So my questions are:
My personal feeling for that last one is a yes, but I'm not well versed in SNMP to know whether that's reasonable or not. I have traced the problem down to a rather horrendous piece of code in the BER.pm module called "decode_by_template", which is passed a sprintf-like string which is decoded manually. Trying to trace through it makes me want to gouge my eyes out: it's a textbook case of creature feep.
I think I'll just mail my hack to the maintainer, and see what he wants to do about it, but any recommendations or ideas will be much appreciated, before I send it off.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
(tye)Re: Bug in SNMP_Session when polling Netscreen routers with MRTG
by tye (Sage) on Jun 27, 2002 at 02:53 UTC |