[rrd-users] Re: DERIVE or COUNTER

Alex van den Bogaerdt alex at slot.hollandcasino.nl
Sun Aug 19 00:28:06 MEST 2001


Mike Wright wrote:
> 
> I have been thinking about this one for a while.
> 
> At 17:21 18/08/01 +0200, Alex van den Bogaerdt wrote:
> >Of course, the front end could simply remember the last value and
> >compare this to the current value.  If the current value is lower,
> >send an "U" value to the back end (rrdtool) in stead of the value.
> 
> You are losing a lot of data if you do this. Fair enough on a slow serial
> line, but at 100 Meg bits/second you could theoretically get a counter wrap
> in just 343 seconds. If your front end is monitoring fast interfaces then
> you could be discarding a high percentage of your samples. Also, if you log
> "Unknown" into the RRD, I think it takes two samples before RRDtool can log
> a data point again.

It is exactly the same what happens when you use DERIVE.  The only difference
is that it is the front end doing it in stead of the back end.  The nett
result is that any negative delta will be transformed into an unknown value.

On 100mbps links, DERIVE will miss a lot of samples when monitoring using
32-bit counters.  Using 64-bit counters is a proper way of handling that,
properly handling wraps is going to fail at speeds of over 229 Mbps.
If the counter wraps twice between two polls, it is incremented by 2^33.
The rate will thus be 2^33 / 300 = 28633115 bytes per second or 229 Mbps.

> >There's more a front end could/should do.  For instance, it could
> >try and conpensate for time differences between sampling the value
> >and processing it.  Especially on an overloaded system this may
> >help.
> 
> Using sysUpTime is a good way to do this as it is measured in 1/100ths of a
> second. Request the sysUpTime along with the counters from the router and
> compare the sysUpTime to the value from the previous sample and you have a
> time difference accurate to 1/100th of a second.

Indeed, this is what I meant.  Using NTP to synchronize the times on
both devices helps and furthermore you don't really care what the
*exact time* is, you just want to know the exact time *delta*.

The difference between the system clock (or the system uptime) and the
device uptime will more or less be a constant.  Any variance should thus
be caused by jitter in the monitoring system and should be corrected.

> I mentioned on the list a while ago that it would be nice for RRDtool to
> take a sysUpTime along with the counters and use this to log more accurate
> samples. RRDtool could also use this to detect a reboot (ie: sysUpTime
> going backwards) and compensate for the counters being reset. 

And I probably replied by saying this is a task for the front end.  If
I didn't do so, I'm doing it now.  This is not something that should go
into RRDtool.  What needs to be done is that the front end delivers a
proper time to RRDtool.  This time needs to be adjusted from local system
time and will therefore already contain the necessary adjustment.

What would be a welcome addition, one which is already in the TODO file,
is using times with a better granularity.  I think it is time to start
a discussion on this however it should be on the rrd-developers list.

> I have a front end which does something like this. When it detects a
> reboot, it logs a NaN into the RRD at time_now - sysUpTime - 1 second, then
> logs zero at time_now - sysUpTime, then logs the sample we have just
> collected from the router at time_now. This way a router reboot has a
> minimal impact on the data collection, both in terms of spikes in the data
> or data points lost.

I think if you go back into the archives, this is something I suggested
to the list (maybe even to you).

> The only thing that caught me out was the sysUpTime counter wraps, many of
> our routers and servers have been up for much more than 497 days.....

Yup, I know :(  New software on one of our switches, it needed a reboot.
The uptime was 1030 days :(  Oh well, at least I got to show it to the
NT people :)

cheers,
-- 
   __________________________________________________________________
 / alex at slot.hollandcasino.nl                  alex at ergens.op.het.net \
| work                                                         private |
| My employer is capable of speaking therefore I speak only for myself |
+----------------------------------------------------------------------+
| Technical questions sent directly to me will be nuked. Use the list. | 
+----------------------------------------------------------------------+
| http://faq.mrtg.org/                                                 |
| http://rrdtool.eu.org  --> tutorial                                  |
+----------------------------------------------------------------------+

--
Unsubscribe mailto:rrd-users-request at list.ee.ethz.ch?subject=unsubscribe
Help        mailto:rrd-users-request at list.ee.ethz.ch?subject=help
Archive     http://www.ee.ethz.ch/~slist/rrd-users
WebAdmin    http://www.ee.ethz.ch/~slist/lsg2.cgi



More information about the rrd-users mailing list