[rrd-users] Handling frequent counter-wraps

Alex van den Bogaerdt alex at ergens.op.het.net
Thu Mar 8 16:32:07 CET 2007


On Thu, Mar 08, 2007 at 03:30:16PM +0100, Rickard Dahlstrand wrote:

> My problem is that I can't get the server to send more than one report
> per every five minute-period.

Ack.

> 1. Is there a way to get rrdtool to smooth over the wrap and estimate
> what should be where it's now unknown?

You could explore the xff setting.  This is "X-Files factor", as strange
things will happen to your data.  It may look nicer, but it is not as
accurate as your original data.

> 2. Is there a way to tell rrdtool that if it gets a value lower than 0,
> it should use 0 as the base instead of adding unknown?

You won't get a value lower than 0, you would get a value lower than the
previous value and rrdtool computes:
- for COUNTER:  a very high rate (it is assuming a counter wrap occured)
                e.g. 60,000 -> 10
                negative -> cannot happen -> counter has wrapped, add
                2^32 to get 60,000 -> 2^32+10
                diff = 2^32+10-60,000 = almost 2^32
                per 300 seconds, rate = approx. 14M3
- for DERIVE:   a negative rate (this counter type can decrement)
                60,000 -> 10
                negative can happen
                rate = (10-60,000)/300 = almost -200

RRDtool can ignore "wrong" rates (min and max allowable rate).

You could keep track of previous counter values and, if you detect a
value lower than the previous one, you will know the zone has been
updated thus a counter reset occured.  At that point, you could estimate
how long ago this update occured. If your normal rate is e.g. 180 and if
your current value is 360, then the update occured two seconds ago.
You can update your database with timestamps (not "N" for now, but real
unix timestamps like 1173365985). Subtract two seconds, update with zero,
use true time, update with 360.  Result: one unknown interval, one estimated
interval.

There's no way you can accurately determine what the counter value would
have been when the zone was updated, but you can estimate in a similar
fashion.  You have guessed when the update took place, and together with
the estimated rate you could fake another update.

This all said: I think you would be reimplementing the xff logic, so maybe
you should just accept those unknown intervals and hide them.

Best is if you get the counter value just before a zone update is taking
place.  The shorter the difference in time, the better.  If you detect
a counter reset (thus a zone update) then you know it has been done about
five minutes ago.  Known time range, known values, almost no guessing !

Have xff slightly larger than 0.
Remember the last counter value, or fetch it from the database each time.
Remember the last update time,   or fetch it from the database each time.
If current counter value < last counter value:
   update last time + 1 second with 0
   update this time with current counter value

Effectively you will be dividing the current (not yet normalized) interval
in two parts:
1 second unknown data (make sure 60,000->0 doesn't result in a valid rate)
299 seconds known data (0->current counter)

That second part should have been 300 seconds long.  As a result the computed
rate will be slightly higher than reality.  I do not consider 0.3 percent
to be a problem but of course YMMV.

This one-second unknown part should be invisible, unless you won't allow for
this to happen (RTFM xff in man rrdcreate)

There may be different ways to deal with this problem.  Just consider what
you know, and how rrdtool learns about this information.  The more you know,
the more accurate your data will be.  Choices you make may affect the amount
of information available.

HTH
-- 
Alex van den Bogaerdt
http://www.vandenbogaerdt.nl/rrdtool/



More information about the rrd-users mailing list