[rrd-users] Re: DERIVE or COUNTER

Mike Wright Mike at auckland-services.freeserve.co.uk
Sat Aug 18 21:41:59 MEST 2001


I have been thinking about this one for a while.

At 17:21 18/08/01 +0200, Alex van den Bogaerdt wrote:
>Of course, the front end could simply remember the last value and
>compare this to the current value.  If the current value is lower,
>send an "U" value to the back end (rrdtool) in stead of the value.

You are losing a lot of data if you do this. Fair enough on a slow serial
line, but at 100 Meg bits/second you could theoretically get a counter wrap
in just 343 seconds. If your front end is monitoring fast interfaces then
you could be discarding a high percentage of your samples. Also, if you log
"Unknown" into the RRD, I think it takes two samples before RRDtool can log
a data point again.

Conversely, if you are monitoring a slow link (say 56k bits/second) then it
doesn't really matter if the counter gets reset because the spike in the
graph will be probably be much larger than the MAX for the RRD and RRDTool
will deal with it anyway. 

>There's more a front end could/should do.  For instance, it could
>try and conpensate for time differences between sampling the value
>and processing it.  Especially on an overloaded system this may
>help.

Using sysUpTime is a good way to do this as it is measured in 1/100ths of a
second. Request the sysUpTime along with the counters from the router and
compare the sysUpTime to the value from the previous sample and you have a
time difference accurate to 1/100th of a second.

I mentioned on the list a while ago that it would be nice for RRDtool to
take a sysUpTime along with the counters and use this to log more accurate
samples. RRDtool could also use this to detect a reboot (ie: sysUpTime
going backwards) and compensate for the counters being reset. 

I have a front end which does something like this. When it detects a
reboot, it logs a NaN into the RRD at time_now - sysUpTime - 1 second, then
logs zero at time_now - sysUpTime, then logs the sample we have just
collected from the router at time_now. This way a router reboot has a
minimal impact on the data collection, both in terms of spikes in the data
or data points lost.

The only thing that caught me out was the sysUpTime counter wraps, many of
our routers and servers have been up for much more than 497 days.....

Cheers,

Mike Wright


--
Unsubscribe mailto:rrd-users-request at list.ee.ethz.ch?subject=unsubscribe
Help        mailto:rrd-users-request at list.ee.ethz.ch?subject=help
Archive     http://www.ee.ethz.ch/~slist/rrd-users
WebAdmin    http://www.ee.ethz.ch/~slist/lsg2.cgi



More information about the rrd-users mailing list