[mrtg] Problem: unrealistic spikes in charts (missed snmp poll)

Rich Adamson radamson at routers.com
Wed Apr 17 15:58:21 MEST 2002


I've recently completed testing of v2.8.17 through v2.9.18pre3 and found
the mrtg programming logic for missed snmp polls is erroneous causing
hugh spikes in the mrtg charts at times. Each version attempts to make 
the chart visually pleasing by "faking" a value, and the calculation of 
that fake value is incorrect. Our tests started with an empty log file, 
polling a Cisco router for a single ethernet interface while tracing the
snmp values with a Sniffer. I disrupted mrtg's communications path to the 
router causing it to miss a single five-minute polling cycle. The actual 
log file looks like:

  1018898222 888472927 2493104300 <== actual snmp counter values
  1018898222 421 1310 421 1310
  1018897922 187 1004 187 1004
  1018897800 1250069 3507792 3073668 8624255
  1018897500 1711078 4801268 3073668 8624255  <=== fake value 
  1018897200 157 978 157 978
  1018896900 342 1525 614 2325
  1018896600 364 1379 614 2325

The first line in the log file reflects the actual snmp values returned
by an snmp poll. All remaining lines in the file represent past historic
averages used to create the various mrtg charts. The first successful 
snmp poll after the missed snmp response created hugh "fake" values (which 
in the above case represents 3,073,660 ifInOctets and 8,624,255 ifOutOctets bytes
per second). The previous poll represents 157 ifInOctets and 978 ifOutOctets 
(bytes/second), which were validated as accurate. Comments within the 
rateup.c source code indicate the intent was to replicate a previous historic 
value for a missed poll. However as noted above, a fake unrealistic value 
is inserted instead, and then it is replicated. (From a network management
perspective, I question the value of mrtg inserting fake values into a
chart rather than simply reporting a zero value on the chart. At least 
those of use that manage resources can understand a zero value, however the
fake value attempts to cover up the real issue.)

The "fake" values that were created by mrtg have "some" relationship to 
the actual raw snmp values returned, however I have not been able to 
determine what that relationship happens to be (other than it is 
incorrect). Further, since there is some relationship to the raw snmp 
counter values, the size of the fake value (and resulting spike on the 
chart) are related to the how long the measured device has been functioning
and the magnitude of the raw snmp counters.

Therefore, some mrtg users will see the spike and others will not depending
upon the actual raw snmp counter values returned by the device. (If the 
measured device is frequently rebooted or the snmp counters are always small, 
the fake value and associated spike is not as noticeable.)

Since the fake values and resulting spike can be of "any" size, mrtg users
should be aware of this problem and not trust everything that is displayed
on the charts. The fake values also impact the weekly, monthly and yearly
charts, as well as the numeric summary immediately below each chart.

If anyone has any idea how to fix this issue, I'd greatly appreciate 
hearing from them.

Rich
radamson at routers.com



--
Unsubscribe mailto:mrtg-request at list.ee.ethz.ch?subject=unsubscribe
Archive     http://www.ee.ethz.ch/~slist/mrtg
FAQ         http://faq.mrtg.org    Homepage     http://www.mrtg.org
WebAdmin    http://www.ee.ethz.ch/~slist/lsg2.cgi



More information about the mrtg mailing list