[rrd-users] Re: odd spikes due to early resets

Wed Feb 21 01:34:50 MET 2001

Clifton Royston wrote:
> 
> > 	err_probability = ( (10^6 / 8) / ( 2^32 / 300) ) = 0.8% 
> > 
> > 	This is _very_ low!!! 
> 
> I have to disagree with you on this, if I understand what you are
> calculating!
> 
> A probability of nearly 1% in a given sample is a very high probability
> when you are doing samples on thousands of interfaces, thousands of
> times per day.  

I think that this is correct in itself and I also think that there
are other factors to take into account.

1) It doesn't matter if the number of irratic wraps (that is: resets
   resulting in a legal update) is low or high.  Even if you can say
   that it happens only 1:10000, you still have a problem if it does
   happen

2) If your average update is very low compared to the interface speed,
   the chances of a wrap being irratic increases (motivation below)

3) There is only one real way to detect a counter reset and that is
   by asking the device. Of course the device needs to support that. 
   It seems that in Real Life, most devices don't support this or if
   they do, we don't know about that mechanism.  We can detect a
   device reset (and should do so, in the front end) but detecting
   a "clear counters" type of command is much harder

Why do I think the problem is bigger for interfaces with low traffic
compared to their speed?

Consider when a wrap can occur.  It will always be when the counter
has crossed a certain value.  Below this value RRDtool can detect the
error and thus there is no problem.  We can safely ignore these resets
and move on to the problem zone.

The problem zone can be calculated. In stead of using real formulas
with variables lets look at a real example; assume a five minute
interval time and a 100mbps interface.

The largest update not causing a wrap is 300*12.5M (seconds times
bytes per second) = 3,750 MBytes.  The problem zone is therefore
3,750 MBytes large.  When the counter has reached its maximum minus
this 3,750MBytes three things may happen:
1) An update occurs which moves you outside the problem zone
2) An update occurs which does not move you outside the problem zone
3) A reset occurs which is causing the problem

Both #1 and #3 move you out of the problem zone, without respectively
with a spike.

When the interface moves little traffic, there is a series of small
deltas which cause #2 to happen.  If you are fortunate enough
for #1 to happen, there is no problem.  However, if the counter
gets reset for whatever purpose, number three happens.  Since there
are many updates that cause #2, and only one update that causes
#1 to happen, it seems to me that this gives a huge number of
updates and thus a long time frame where a counter reset causes a
problem.  Note that I understand that there is an even bigger window
where a reset doesn't cause a problem.  IMHO this is not important
for this reasoning.  Now, what happens is that the lower your average
delta is, the higher the number of #2 cases are in comparison to #1
cases.  At every interval you have a chance that you encounter the
wrap-problem and thus the problem decreases as the deltas increase.
Furthermore, an irratic spike will be of average size 50M (half of
the interface speed as it is random).  If normal traffic is low, this
spike really is a spike and thus a problem.  If normal traffic is
high, the spike may not be visible at all and thus it is no problem.

In any case it would be better to use 64-bit counters for these kind
of interfaces.  It provides better support for traffic near 100mbps
(suppose your measure points are 344 seconds apart, there is no
difference between 100mbps and 0mbps in that case for a 32-bit counter)
and even more important, it takes you many many years to even reach
the problem zone :)

cheers,
-- 
   __________________________________________________________________
 / alex at slot.hollandcasino.nl                  alex at ergens.op.het.net \
| work                                                         private |
| My employer is capable of speaking therefore I speak only for myself |
+----------------------------------------------------------------------+
| Technical questions sent directly to me will be nuked. Use the list. | 
+----------------------------------------------------------------------+
| http://faq.mrtg.org/                                                 |
| http://rrdtool.eu.org  --> tutorial                                  |
+----------------------------------------------------------------------+

--
Unsubscribe mailto:rrd-users-request at list.ee.ethz.ch?subject=unsubscribe
Help        mailto:rrd-users-request at list.ee.ethz.ch?subject=help
Archive     http://www.ee.ethz.ch/~slist/rrd-users
WebAdmin    http://www.ee.ethz.ch/~slist/lsg2.cgi