[rrd-users] Re: odd spikes due to early resets
BAARDA, Don
don.baarda at baesystems.com
Tue Feb 20 00:21:02 MET 2001
G'day,
> -----Original Message-----
> From: Tobias Weingartner [SMTP:weingart at cs.ualberta.ca]
> Sent: Tuesday, February 20, 2001 9:18 AM
> To: BAARDA, Don
> Cc: 'Sasha Mikheev'; Matt Ashfield; RRD users
> Subject: [rrd-users] Re: odd spikes due to early resets
>
>
> On Tuesday, February 20, "BAARDA, Don" wrote:
> > > -----Original Message-----
> > > From: Sasha Mikheev [SMTP:sasha at netvision.net.il]
> > > Sent: Tuesday, February 20, 2001 3:06 AM
> > > To: Matt Ashfield
> > > Cc: RRD users
> > > Subject: [rrd-users] Re: odd spikes due to early resets
> > >
> > >
> > >
> > > You can define you archive as DERIVE with minimum value of 0. It
> > > will take care of the problem.
> > >
> > Where did this idea come from? The _best_ solution is to set a
> > proper max.
>
> Actually, even with the proper max, the spikes can still drown out
> the real traffic (1Gbps pipe, traffic in the range of 20-30Mbps).
>
With a 32bit counter, a 1Gbps pipe can potentially wrap in 34
seconds... which means you want to be running with step <= 30 or a 64bit
counter to avoid getting garbage every time there is a genuine traffic
spike. With a 32bit counter and 5min step, anything larger than 114Mbps
requires multiple counter wraps so you are never going to see them, instead
you'll get garbage.
with a 64bit counter, a 1Gbps pipe will take 4676 years to wrap.
You can calculate the probability that a reset will be
miss-interpreted as a wrap from your specified_max, and the measurable_max,
where the measurable_max is the highest possible reading you could get
without a max setting. A reset (or any random counter "jump") in theory can
result in a bandwidth reading anywhere in the range 0 to the measurable_max
with an even probability (more on this assumption later). Any result in the
range 0 to specified max will be incorrectly interpreted as a valid value.
Hence the probability of incorrectly interpreting a reset as a valid wrap
is;
err_probability = (specified_max / measurable_max)
where:
measurable_max = (counter_max / step)
For a typical application of a 32bit counter, 1Mbps interface, and
5min step this works out as;
err_probability = ( (10^6 / 8) / ( 2^32 / 300) ) = 0.8%
This is _very_ low!!! You can simply multiply this by the number of
1Mpbs your max is to get the probability of a mistake when using a 32bit
counter and 5min step... ie a 20Mbps max is 20 x 0.8 = 16% chance of a
mistake. The probability of mistaking a reset for a wrap only starts to get
high when your max becomes significant relative to the measurable_max... ie
when you are getting close to getting garbage from counter wraps anyway.
If you are using a 64bit counter, the probability of mistaking a
reset for a counter wrap is so close to zero, even for a 1Gps interface that
you can ignore it. This is a good argument for using 64bit counters... in
combination with a suitable max it can eliminate "spikes" more "correctly"
than any other technique.
Note that I made the assumption that a reset results in an even
distribution of random results. This assumption assumes that resets occur
with equal probability when the counter is at any value.. ie a reset is just
as likely when the counter is at 1 as when it is at 2^32-1. This is true if
the counter wraps many times between resets. However, particularly with
64bit counters, in reality counter resets usually happen more frequently
than counter wraps, which means that a reset is more likely when a counter
is low than when it is high. This means resets usually result in larger
values. This means the assumption of an even distribution of results on a
reset is actually conservative, so the real probability of errors is
actually lower...
> One solution is to put up a NaN every time the counter rolls around
> or is reset. At least I've found that to be more acceptable than
> the other way around...
>
> Wish rrdupdate had an option to put up NaN's (don't do counter
> wrap-around calculation) when the counters reset/wrap.
>
It sort of does... this is exactly what DERIVE with min=0 does.
> --Toby.
>
> --
> Unsubscribe mailto:rrd-users-request at list.ee.ethz.ch?subject=unsubscribe
> Help mailto:rrd-users-request at list.ee.ethz.ch?subject=help
> Archive http://www.ee.ethz.ch/~slist/rrd-users
> WebAdmin http://www.ee.ethz.ch/~slist/lsg2.cgi
--
Unsubscribe mailto:rrd-users-request at list.ee.ethz.ch?subject=unsubscribe
Help mailto:rrd-users-request at list.ee.ethz.ch?subject=help
Archive http://www.ee.ethz.ch/~slist/rrd-users
WebAdmin http://www.ee.ethz.ch/~slist/lsg2.cgi
More information about the rrd-users
mailing list