[rrd-users] Using RRD with service check monitors (Netsaint/Nagios, Mon, BB etc)

Stanley Hopcroft Stanley.Hopcroft at IPAustralia.Gov.AU
Mon Aug 19 05:13:27 MEST 2002


Dear Ladies and Gentlemen,

I am writing to request your comments on using RRD as a means of
providing persistence to service checks scheduled by Service Monitors
[SM] such as Netsaint/Nagios, Mon, Big Brother etc.

I am finding that the SM schedlules the checks to suit itself and even
though the check interval my be the same as the RRD step size, the check
may try and update the RRD at any time in the interval [step_boundary -
STEP, step_boundary + STEP] (eg an RRD that expects to be updated each 5
minutes may in fact be updated at times, 6, 12, 18 ..).

Even when the check attempts to force the update on the last step
boundary ( $now = time(); $t = $t - ($t % STEP); RRDs::update ... $t . 
':' . $update ;), the data stored in the RRD can vary [Here is a fetch 
from the problem RRD] from what the real values should be

                                                       should
                                                       be
Mon Aug 19 12:00:00 2002        1029722400       364.0         44.0 
Mon Aug 19 12:05:00 2002        1029722700        18.0 365     27.0 
Mon Aug 19 12:10:00 2002        1029723000       370.0         44.0 
Mon Aug 19 12:15:00 2002        1029723300        20.0 371     27.0 
Mon Aug 19 12:20:00 2002        1029723600        20.0 375     27.0 
Mon Aug 19 12:25:00 2002        1029723900       377.0         44.0 
Mon Aug 19 12:30:00 2002        1029724200        20.0 378     27.0 
Mon Aug 19 12:35:00 2002        1029724500       378.0 378     44.0 
Mon Aug 19 12:40:00 2002        1029724800       379.0         44.0 
Mon Aug 19 12:45:00 2002        1029725100        20.0 380     27.0 
Mon Aug 19 12:50:00 2002        1029725400       381.0         44.0 
Mon Aug 19 12:55:00 2002        1029725700        21.0 381     27.0 
Mon Aug 19 13:00:00 2002        1029726000       381.0         44.0 
Mon Aug 19 13:05:00 2002        1029726300         0.0          0.0 

I really would prefer not to see the 20.0 20.0 20.0 21.0 (un real) 
values.

This check is monitoring a slow running processe (not a router 
interface) so a 300 second step is appropriate. Should the step be 
reduced, the changes would only show over more samples (This is a 
producer processe where the first column represents 'Successfully 
processed'). I may have to do this.

Also, having the service check sleep until the next step boundary does
not seem feasable because the check could then run for up to 299 seconds
and this would requires change to the monitor (that usually kills checks
that don't return in a timely manner).

Thank you,

Yours sincerely.

-- 
------------------------------------------------------------------------
Stanley Hopcroft
------------------------------------------------------------------------

'...No man is an island, entire of itself; every man is a piece of the
continent, a part of the main. If a clod be washed away by the sea,
Europe is the less, as well as if a promontory were, as well as if a
manor of thy friend's or of thine own were. Any man's death diminishes
me, because I am involved in mankind; and therefore never send to know
for whom the bell tolls; it tolls for thee...'

from Meditation 17, J Donne.

--
Unsubscribe mailto:rrd-users-request at list.ee.ethz.ch?subject=unsubscribe
Help        mailto:rrd-users-request at list.ee.ethz.ch?subject=help
Archive     http://www.ee.ethz.ch/~slist/rrd-users
WebAdmin    http://www.ee.ethz.ch/~slist/lsg2.cgi



More information about the rrd-users mailing list