[mrtg] unknaszero alternatives
jay at west.net
Sat Oct 22 04:07:23 CEST 2011
On 10/18/11 5:28 PM, Tim Chambers wrote:
> I'm an end user of MRTG, not an administrator of MRTG and new to this list.
> I noticed in the MRTG docs the assumption of using the last value if the
> SNMP response is not received. This is probably OK for the occasional
> dropped packet but not so good if it's indicative of a device/link failure.
There is a very large difference between traffic TO a router and traffic
THROUGH a router. A dropped SNMP packet may indicate a problem on the
specific interface to which the MRTG server is connected, or it may be
due to a failure on an intermediate device that has nothing to do with
any of the interfaces being measured on the router.
> My suggestion is a few options to compromise which rely on recording in the
> log if the entry is a result of a success/failure, good/bad, real/estimate
> (true/false for SNMP OK) or whatever you want to call it.
Again, this may be useful solely with respect to the interface to which
the measurement server is connected but is irrelevant to measurements
from other interfaces.
> 1) The first 1 or 2 times it uses the values of previous values.
> 2) Is it possible to check the status of a router or link to/through router in a
> way that uses TCP to verify if it is working or non-existent?
If the interface connected to the measurement server is impaired, then
if any probes are successful, the impairment will show in the
measurement itself. A flat line on the graph stands out very well,
especially compared to something intermittently dropping to zero. We've
learned a long time ago to understand that it usually indicates a
failure to get a sane value (or any value at all).
> 3) 3rd time it happens in a row (or whatever the admin uses as a threshold)
> it starts assuming the 0 values. If it can verify the router/link is still working
> then it might assume the link is near it's past/recent max. If the router/link
> can be verified to be lost then it's better to show 0.
Even for things like interface errors, CPU usage, temperature, and free
> 4) Is it possible to change past assumed entries to be an average of the
> value before and after the temporary loss of a few responses when the
> SNMP response returns ?
I suppose one could do the math. Assumptions are of course not the same
> The basic fail on 3rd attempt would be the easiest to implement and with
> some impact to performance. The other options would require more code &
> IO and would slow the update process.
Another option would be to continue to keep the last value (or drop to
zero if unknaszero is set) but change the color of the graph or
background to indicate lost measurements.
> What do other admins, users and potential MRTG programmers think of my
> suggestions ?
In our network we operate two systems with some things in common. One
is a measurement system, MRTG. The other is an alerting system, Nagios.
Nagios takes some outputs from MRTG and alerts us if some things are
out of range such as extremely high/low utilization of certain
interfaces, memory, temperature, utility voltage and current, etc.
Nagios also alerts us if devices are unreachable, HTML pages fail to
load, a datacenter door is open, a voicemail is in the outages queue for
over a certain amount of time, etc.
And there are many things that we graph with MRTG that we don't use
Nagios to alert on at all. Different tools for different purposes.
> Do you have any alternative suggestions ?
Unknaszero is there for a reason. It is appropriate in some cases, not
in others. It is an option for that reason. Use it where you feel that
it is appropriate. Don't when it is not. In my opinion, if it isn't
appropriate for the first two missed probes, it probably isn't for the
third, at least for anything that we are graphing.
> Is it worth spending time and resources to expand the MRTG features ?
Not in the manner that you suggest for the way that we use MRTG in our
organization. It may be for others.
Jay Hennigan - CCIE #7880 - Network Engineering - jay at impulse.net
Impulse Internet Service - http://www.impulse.net/
Your local telephone and internet company - 805 884-6323 - WB6RDV
More information about the mrtg