[rrd-users] bug in dealing with counter wrap in 1.4.3?
Rick Jones
rick.jones2 at hp.com
Wed Jun 15 19:20:31 CEST 2011
On Wed, 2011-06-15 at 06:24 +0200, Tobias Oetiker wrote:
> note that wrap detection is only active in COUNTER mode ... in any
> event, make sure you use the latest stable snapshot for testing ...
> I am not aware of any problem ... you may want to test with some
> crafterd data ...
So, I managed to find a place to slap 1.4.5.002185:
$ rrdtool version
RRDtool 1.4.5.002185 Copyright 1997-2010 by Tobias Oetiker
<tobi at oetiker.ch>
Compiled Jun 15 2011 09:53:09
And creating with COUNTER no longer has the discontinuities like I
thought I saw on 1.4.3 (from Ubuntu) with both COUNTER and DERIVE.
I cannot find the reference to what led me to use DERIVE in the first
place - I think it had something to do with dealing with devices that
might reset with some frequency. Ah, wait, there it is - in the
rrdcreate discussion:
http://oss.oetiker.ch/rrdtool/doc/rrdcreate.en.html
DERIVE
will store the derivative of the line going from the last to the
current value of the data source. This can be useful for gauges,
for example, to measure the rate of people entering or leaving a
room. Internally, derive works exactly like COUNTER but without
overflow checks. So if your counter does not reset at 32 or 64
bit you might want to use DERIVE and combine it with a MIN value
of 0.
NOTE on COUNTER vs DERIVE
by Don Baarda <don.baarda at baesystems.com>
If you cannot tolerate ever mistaking the occasional counter
reset for a legitimate counter wrap, and would prefer "Unknowns"
for all legitimate counter wraps and resets, always use DERIVE
with min=0. Otherwise, using COUNTER with a suitable max will
return correct values for all legitimate counter wraps, mark
some counter resets as "Unknown", but can mistake some counter
resets for a legitimate counter wrap.
For a 5 minute step and 32-bit counter, the probability of
mistaking a counter reset for a legitimate wrap is arguably
about 0.8% per 1Mbps of maximum bandwidth. Note that this
equates to 80% for 100Mbps interfaces, so for high bandwidth
interfaces and a 32bit counter, DERIVE with min=0 is probably
preferable. If you are using a 64bit counter, just about any max
setting will eliminate the possibility of mistaking a reset for
a counter wrap.
Of course, more than a bit of "shame on me" for blithely forgetting that
DERIVE doesn't do wrap protection, but there it is, still suggesting
that "for high bandwidth interfaces and a 32bit counter, DERIVE with
min=0 is probably preferable." I wonder if that suggestion is still
applicable in the age of 1 GbE and 10GbE interfaces (yes, one should be
getting 64 bit counters from/for those but still...) or if that text
could use some editing.
I also wonder if this explains some of the emails I've been seeing on
the ntop users mailing list - as ntop 4.0.3 at least is using DERIVE
with a min of 0 (and that may have sealed the deal on my having chosen
it for my own nefarious porpoises).
if(isCounter) {
/*
The use of DERIVE should avoid spikes on graphs when
ntop is restarted.
Patch courtesy of Graeme Fowler <graeme at graemef.net>
*/
safe_snprintf(__FILE__, __LINE__, counterStr, sizeof(counterStr),
"DS:counter:%s:%d:0:%u",
"DERIVE" /* "COUNTER" */,
heartbeat, topValue);
} else {
rick jones
More information about the rrd-users
mailing list