[rrd-users] bug in dealing with counter wrap in 1.4.3?

Wed Jun 15 19:20:31 CEST 2011

On Wed, 2011-06-15 at 06:24 +0200, Tobias Oetiker wrote:
> note that wrap detection is only active in COUNTER mode ... in any
> event, make sure you use the latest stable snapshot for testing ...
> I am not aware of any problem ... you may want to test with some
> crafterd data ...

So, I managed to find a place to slap 1.4.5.002185:

$ rrdtool version
RRDtool 1.4.5.002185  Copyright 1997-2010 by Tobias Oetiker
<tobi at oetiker.ch>
               Compiled Jun 15 2011 09:53:09

And creating with COUNTER no longer has the discontinuities like I
thought I saw on 1.4.3 (from Ubuntu) with both COUNTER and DERIVE.

I cannot find the reference to what led me to use DERIVE in the first
place - I think it had something to do with dealing with devices that
might reset with some frequency.  Ah, wait, there it is - in the
rrdcreate discussion:

http://oss.oetiker.ch/rrdtool/doc/rrdcreate.en.html

DERIVE

        will store the derivative of the line going from the last to the
        current value of the data source. This can be useful for gauges,
        for example, to measure the rate of people entering or leaving a
        room. Internally, derive works exactly like COUNTER but without
        overflow checks. So if your counter does not reset at 32 or 64
        bit you might want to use DERIVE and combine it with a MIN value
        of 0.

        NOTE on COUNTER vs DERIVE

        by Don Baarda <don.baarda at baesystems.com>

        If you cannot tolerate ever mistaking the occasional counter
        reset for a legitimate counter wrap, and would prefer "Unknowns"
        for all legitimate counter wraps and resets, always use DERIVE
        with min=0. Otherwise, using COUNTER with a suitable max will
        return correct values for all legitimate counter wraps, mark
        some counter resets as "Unknown", but can mistake some counter
        resets for a legitimate counter wrap.

        For a 5 minute step and 32-bit counter, the probability of
        mistaking a counter reset for a legitimate wrap is arguably
        about 0.8% per 1Mbps of maximum bandwidth. Note that this
        equates to 80% for 100Mbps interfaces, so for high bandwidth
        interfaces and a 32bit counter, DERIVE with min=0 is probably
        preferable. If you are using a 64bit counter, just about any max
        setting will eliminate the possibility of mistaking a reset for
        a counter wrap.

Of course, more than a bit of "shame on me" for blithely forgetting that
DERIVE doesn't do wrap protection, but there it is, still suggesting
that "for high bandwidth interfaces and a 32bit counter, DERIVE with
min=0 is probably preferable."  I wonder if that suggestion is still
applicable in the age of 1 GbE and 10GbE interfaces (yes, one should be
getting 64 bit counters from/for those but still...) or if that text
could use some editing.

I also wonder if this explains some of the emails I've been seeing on
the ntop users mailing list - as ntop 4.0.3 at least is using DERIVE
with a min of 0 (and that may have sealed the deal on my having chosen
it for my own nefarious porpoises).

      if(isCounter) {
	/*
	  The use of DERIVE should avoid spikes on graphs when
	  ntop is restarted.
	  Patch courtesy of Graeme Fowler <graeme at graemef.net>
	 */
	safe_snprintf(__FILE__, __LINE__, counterStr, sizeof(counterStr),
		      "DS:counter:%s:%d:0:%u",
		      "DERIVE" /* "COUNTER" */,
		      heartbeat, topValue);
      } else {

rick jones