[rrd-users] Re: odd spikes due to early resets

BAARDA, Don don.baarda at baesystems.com
Tue Feb 20 00:21:02 MET 2001


G'day,

> -----Original Message-----
> From:	Tobias Weingartner [SMTP:weingart at cs.ualberta.ca]
> Sent:	Tuesday, February 20, 2001 9:18 AM
> To:	BAARDA, Don
> Cc:	'Sasha Mikheev'; Matt Ashfield; RRD users
> Subject:	[rrd-users] Re: odd spikes due to early resets 
> 
> 
> On Tuesday, February 20, "BAARDA, Don" wrote:
> > > -----Original Message-----
> > > From:	Sasha Mikheev [SMTP:sasha at netvision.net.il]
> > > Sent:	Tuesday, February 20, 2001 3:06 AM
> > > To:	Matt Ashfield
> > > Cc:	RRD users
> > > Subject:	[rrd-users] Re: odd spikes due to early resets
> > > 
> > > 
> > > 
> > > You can define you archive as DERIVE with minimum value of 0. It
> > > will take care of the problem.
> > > 
> > 	Where did this idea come from? The _best_ solution is to set a
> > proper max.
> 
> Actually, even with the proper max, the spikes can still drown out
> the real traffic (1Gbps pipe, traffic in the range of 20-30Mbps).
> 
	With a 32bit counter, a 1Gbps pipe can potentially wrap in 34
seconds... which means you want to be running with step <= 30 or a 64bit
counter to avoid getting garbage every time there is a genuine traffic
spike. With a 32bit counter and 5min step, anything larger than 114Mbps
requires multiple counter wraps so you are never going to see them, instead
you'll get garbage.

	with a 64bit counter, a 1Gbps pipe will take 4676 years to wrap.

	You can calculate the probability that a reset will be
miss-interpreted as a wrap from your specified_max, and the measurable_max,
where the measurable_max is the highest possible reading you could get
without a max setting. A reset (or any random counter "jump") in theory can
result in a bandwidth reading anywhere in the range 0 to the measurable_max
with an even probability (more on this assumption later). Any result in the
range 0 to specified max will be incorrectly interpreted as a valid value.
Hence the probability of incorrectly interpreting a reset as a valid wrap
is;

	err_probability = (specified_max / measurable_max)

	where:

	measurable_max = (counter_max / step)

	For a typical application of a 32bit counter, 1Mbps interface, and
5min step this works out as;

	err_probability = ( (10^6 / 8) / ( 2^32 / 300) ) = 0.8% 

	This is _very_ low!!! You can simply multiply this by the number of
1Mpbs your max is to get the probability of a mistake when using a 32bit
counter and 5min step... ie a 20Mbps max is 20 x 0.8 = 16% chance of a
mistake. The probability of mistaking a reset for a wrap only starts to get
high when your max becomes significant relative to the measurable_max... ie
when you are getting close to getting garbage from counter wraps anyway.

	If you are using a 64bit counter, the probability of mistaking a
reset for a counter wrap is so close to zero, even for a 1Gps interface that
you can ignore it. This is a good argument for using 64bit counters... in
combination with a suitable max it can eliminate "spikes" more "correctly"
than any other technique.

	Note that I made the assumption that a reset results in an even
distribution of random results. This assumption assumes that resets occur
with equal probability when the counter is at any value.. ie a reset is just
as likely when the counter is at 1 as when it is at 2^32-1. This is true if
the counter wraps many times between resets. However, particularly with
64bit counters, in reality counter resets usually happen more frequently
than counter wraps, which means that a reset is more likely when a counter
is low than when it is high. This means resets usually result in larger
values. This means the assumption of an even distribution of results on a
reset is actually conservative, so the real probability of errors is
actually lower...

> One solution is to put up a NaN every time the counter rolls around
> or is reset.  At least I've found that to be more acceptable than
> the other way around...
> 
> Wish rrdupdate had an option to put up NaN's (don't do counter
> wrap-around calculation) when the counters reset/wrap.
> 
	It sort of does... this is exactly what DERIVE with min=0 does.

> --Toby.
> 
> --
> Unsubscribe mailto:rrd-users-request at list.ee.ethz.ch?subject=unsubscribe
> Help        mailto:rrd-users-request at list.ee.ethz.ch?subject=help
> Archive     http://www.ee.ethz.ch/~slist/rrd-users
> WebAdmin    http://www.ee.ethz.ch/~slist/lsg2.cgi

--
Unsubscribe mailto:rrd-users-request at list.ee.ethz.ch?subject=unsubscribe
Help        mailto:rrd-users-request at list.ee.ethz.ch?subject=help
Archive     http://www.ee.ethz.ch/~slist/rrd-users
WebAdmin    http://www.ee.ethz.ch/~slist/lsg2.cgi



More information about the rrd-users mailing list