[mrtg] Problem: unrealistic spikes in charts (missed snmp poll)
Rich Adamson
radamson at routers.com
Wed Apr 17 15:58:21 MEST 2002
I've recently completed testing of v2.8.17 through v2.9.18pre3 and found
the mrtg programming logic for missed snmp polls is erroneous causing
hugh spikes in the mrtg charts at times. Each version attempts to make
the chart visually pleasing by "faking" a value, and the calculation of
that fake value is incorrect. Our tests started with an empty log file,
polling a Cisco router for a single ethernet interface while tracing the
snmp values with a Sniffer. I disrupted mrtg's communications path to the
router causing it to miss a single five-minute polling cycle. The actual
log file looks like:
1018898222 888472927 2493104300 <== actual snmp counter values
1018898222 421 1310 421 1310
1018897922 187 1004 187 1004
1018897800 1250069 3507792 3073668 8624255
1018897500 1711078 4801268 3073668 8624255 <=== fake value
1018897200 157 978 157 978
1018896900 342 1525 614 2325
1018896600 364 1379 614 2325
The first line in the log file reflects the actual snmp values returned
by an snmp poll. All remaining lines in the file represent past historic
averages used to create the various mrtg charts. The first successful
snmp poll after the missed snmp response created hugh "fake" values (which
in the above case represents 3,073,660 ifInOctets and 8,624,255 ifOutOctets bytes
per second). The previous poll represents 157 ifInOctets and 978 ifOutOctets
(bytes/second), which were validated as accurate. Comments within the
rateup.c source code indicate the intent was to replicate a previous historic
value for a missed poll. However as noted above, a fake unrealistic value
is inserted instead, and then it is replicated. (From a network management
perspective, I question the value of mrtg inserting fake values into a
chart rather than simply reporting a zero value on the chart. At least
those of use that manage resources can understand a zero value, however the
fake value attempts to cover up the real issue.)
The "fake" values that were created by mrtg have "some" relationship to
the actual raw snmp values returned, however I have not been able to
determine what that relationship happens to be (other than it is
incorrect). Further, since there is some relationship to the raw snmp
counter values, the size of the fake value (and resulting spike on the
chart) are related to the how long the measured device has been functioning
and the magnitude of the raw snmp counters.
Therefore, some mrtg users will see the spike and others will not depending
upon the actual raw snmp counter values returned by the device. (If the
measured device is frequently rebooted or the snmp counters are always small,
the fake value and associated spike is not as noticeable.)
Since the fake values and resulting spike can be of "any" size, mrtg users
should be aware of this problem and not trust everything that is displayed
on the charts. The fake values also impact the weekly, monthly and yearly
charts, as well as the numeric summary immediately below each chart.
If anyone has any idea how to fix this issue, I'd greatly appreciate
hearing from them.
Rich
radamson at routers.com
--
Unsubscribe mailto:mrtg-request at list.ee.ethz.ch?subject=unsubscribe
Archive http://www.ee.ethz.ch/~slist/mrtg
FAQ http://faq.mrtg.org Homepage http://www.mrtg.org
WebAdmin http://www.ee.ethz.ch/~slist/lsg2.cgi
More information about the mrtg
mailing list