[rrd-users] Re: Dilemma in using COUNTER to monitor a system that could crash and reset its parameters

Alex van den Bogaerdt alex at ergens.op.het.net
Fri Apr 16 09:13:34 MEST 2004


On Wed, Apr 14, 2004 at 12:21:38PM -0400, David Lee wrote:

> When the system restarts after a crash, all the new parameter readings
> will be smaller than the previous ones in the rrdtool database. So
> rrdtool will be tricked to think that a COUNTER overflow had occurred.
> The new values will be incorrectly displayed in the graph as many
> magnitudes greater.

> How do I get around this problem? Do I have to:
> 
> 1)       create new rrd database whenever the system crashes so the data
> in the DB remains fresh? (But I don't want to loose all the historic
> data.)

No.  This would be a silly thing to do.

> 2)       or use the data Type DERIVE to take into account for a negative
> rate (i.e. a reading drop), but DERIVE (and ABSOLUTE for that matter)
> does not have overflow protection that I desire.

You can detect a system crash?  If so, use that fact.

What is happening is this:

- A counter value is mentioned to RRDtool      value 'x'
  [counters overflow]
- A counter value is mentioned to RRDtool      value 'y' below 'x'
  [system crashes/system resets/counters reset]
- A counter value is mentioned to RRDtool      value 'z' below 'y'
- A counter value is mentioned to RRDtool      some value above 'z'

If you don't let RRDtool know the counters are reset, there's no way of
knowing the counters are _not_ overflowing.  RRDtool is correctly
assuming an overflow has happened (correctly because you didn't tell
it otherwise) 

The 'y below x' case is not different from the 'z below y' case.

If you can _not_ detect a counter reset, go for the DERIVE solution.
Set the minimum allowed rate to zero.

If you _can_ detect a counter reset, you _know_ that you don't know
the actual counter value:

- A counter value is mentioned to RRDtool      value 'x'
  [counters overflow]
- A counter value is mentioned to RRDtool      value 'y' below 'x'
- [The counter value at the moment of the reset is unknown]
  [system crashes/system resets/counters reset  at time 't']
- [The counter value at time 't' was zero]
- A counter value is mentioned to RRDtool      value 'z' below 'y'
- A counter value is mentioned to RRDtool      some value above 'z'

Feed 'x', 'y', 'U', '0 (zero)', 'z' ...  to rrdtool.

HTH
Alex

--
Unsubscribe mailto:rrd-users-request at list.ee.ethz.ch?subject=unsubscribe
Help        mailto:rrd-users-request at list.ee.ethz.ch?subject=help
Archive     http://www.ee.ethz.ch/~slist/rrd-users
WebAdmin    http://www.ee.ethz.ch/~slist/lsg2.cgi



More information about the rrd-users mailing list