[rrd-developers] Re: Restarting devices

Tobias Oetiker oetiker at ee.ethz.ch
Mon Feb 7 22:36:10 MET 2000


Today you sent me mail regarding Re: [rrd-developers] Re: Restarting devices :

*> What happen if the server is up but you have a slow net or a slow
*> collector? Here is a live sample.  I had a program do pings and 
*> give me rtt results.  Normally it can finish a list of task within 
*> 5 minutes.  It will switch to 10 minutes interval or even 15 minutes 
*> interval if things are slow.  Now since all my data are out of 
*> heartbeat range, I end up have no data at all.
*> 
*> I think we ought to separate the heartbeat requirement baseed on 
*> data type.  Yes, it makes no sense to continue the normal calculation 
*> on the COUNTER after a few hours black out.

no ... the correct solution for your setup is to define a mrhb of 15
minutes ... as the range of response times you expect is between 5 and 15
minutes ... if there is no update for more than 15 minutes something bad is
happening and the data you feed can nut be trusted anymore ...

If you want to ignore this problem you can do this as well by defining a
ridiculously high mrhb of lets say 10 years ... no problem ... but as I said
I would recommend agains it because if you are not able to update within an
expected interval you should make this problem manifest in the database ...

cheers
tobi
*> Min
*> 
*> In message <Pine.GSO.4.21.0002072040520.23339-100000 at engelberg.ee.ethz.ch>, Tob
*> ias Oetiker writes:
*> >Today you sent me mail regarding Re: [rrd-developers] Re: Restarting devices :
*> >
*> >*> >rrdtool only comes up with NaN for an interval if more than Minimal requir
*> >ed
*> >*> >heartbeat seconds of the interval are UNKNOWN ... 
*> >*> 
*> >*> Tobi,
*> >*> 
*> >*> Can I request an option to ignore heartbeat when it set to 0 or -1?
*> >*> Something like:
*> >*> 287,289c287
*> >*> <              rrd.ds_def[i].par[DS_mrhb_cnt].u_cnt >= interval &&
*> >*> <              rrd.ds_def[i].par[DS_mrhb_cnt].u_cnt >0
*> >*> <               ) {
*> >*> ---
*> >*> >              rrd.ds_def[i].par[DS_mrhb_cnt].u_cnt >= interval) {
*> >*> 393,394c391
*> >*> <                    > rrd.ds_def[i].par[DS_mrhb_cnt].u_cnt
*> >*> <                    && rrd.ds_def[i].par[DS_mrhb_cnt].u_cnt > 0 ) ||
*> >*> ---
*> >*> >                    > rrd.ds_def[i].par[DS_mrhb_cnt].u_cnt) ||
*> >*> 
*> >*> I can understand for those data types related to delta time, it is
*> >*> necessary to save guard the data by using heartbeat.  However, some 
*> >*> absolute data such as CPU/memory usage, the right action is to round 
*> >*> the timestamp upto the closest step, instead of losing data by NaN.  I 
*> >*> don't think raising the heartbeat(what I do) is the answer.
*> >
*> >ok think of this ... your server goes down and you are not able to collect
*> >performance data for several hours ... what should rrdtool do (assuming you
*> >normally collect every 5 minutes ...
*> >
*> >cheers
*> >tobi
*> >*> 
*> >*> Min
*> >*> 
*> >*> 
*> >
*> >-- 
*> > ______    __   _
*> >/_  __/_  / /  (_) Oetiker, Timelord & SysMgr @ EE-Dept ETH-Zurich
*> > / // _ \/ _ \/ / TEL: +41(0)1-6325286  FAX:...1517  ICQ: 10419518 
*> >/_/ \.__/_.__/_/ oetiker at ee.ethz.ch http://ee-staff.ethz.ch/~oetiker
*> >
*> >--
*> >Unsubscribe mailto:rrd-developers-request at list.ee.ethz.ch?subject=unsubscribe
*> >Help        mailto:rrd-developers-request at list.ee.ethz.ch?subject=help
*> >Archive     http://www.ee.ethz.ch/~slist/rrd-developers
*> >
*> 
*> 

-- 
 ______    __   _
/_  __/_  / /  (_) Oetiker, Timelord & SysMgr @ EE-Dept ETH-Zurich
 / // _ \/ _ \/ / TEL: +41(0)1-6325286  FAX:...1517  ICQ: 10419518 
/_/ \.__/_.__/_/ oetiker at ee.ethz.ch http://ee-staff.ethz.ch/~oetiker

--
Unsubscribe mailto:rrd-developers-request at list.ee.ethz.ch?subject=unsubscribe
Help        mailto:rrd-developers-request at list.ee.ethz.ch?subject=help
Archive     http://www.ee.ethz.ch/~slist/rrd-developers



More information about the rrd-developers mailing list