[rrd-users] Re: Bug? was RE: rddtool Heartbeat & Step

Tobias Oetiker oetiker at ee.ethz.ch
Fri Aug 10 07:35:10 MEST 2001


Yesterday BAARDA, Don wrote:

 |
 | G'day,

it seems that some code reviewing in this area would be good
atleast to document how it realy works :-)

pushed it to my todo list ...

thanks
tobi
 |
 | > -----Original Message-----
 | > From: Blaise Lepeuple [mailto:blaise at yaga.com]
 | > Sent: Tuesday, August 07, 2001 2:10 PM
 | > To: don.baarda at baesystems.com
 | > Subject: rddtool Heartbeat & Step
 | >
 | >
 | > I'm sorry if you are the wrong person to ask this, but you
 | > had your email
 | > address on the online man page for "rrd create".
 | >
 | > If I should talk to somebody else, please redirect me to him.
 | [...]
 |
 | It's been a while since I examined and understood the internals of RRD. I
 | did go through it a while ago and satisfied myself that it worked, and since
 | then have been content that it does what it is supposed to.
 |
 | > So here is the creation of the rrd :
 | >
 | > rrdtool create test.rrd -s 10 --start 997147699 DS:tik:COUNTER:10:0:U
 | > RRA:AVERAGE:0:1:10
 | [...]
 |
 | I've just had a look at the man page that now includes my description. It
 | includes verbatim my stuff about heartbeat and step, but excluded the bit I
 | had on the end of that email about xff. I'll add it here for your info,
 | because it will help when you start making RRA's with steps>1;
 |
 | 	You are right that "xff" has little affect if you have few "unknown"
 | PDPs, and setting "heartbeat" high is one way of reducing the number of
 | "unknown" PDPs. However, it is worth remembering that "unknowns" can happen
 | because of other reasons. When setting "xff", you are deciding how many
 | "unknown" PDPs are acceptable when accumulating into an RRA, and an
 | "unknown" really means that rrd has no idea what the rate for the PDP was.
 | So "xff" is a "garbage threshold" for how much missing input data you can
 | tolerate when accumulating your data into "course grain", large "steps",
 | RRAs.
 |
 | 	When setting "heartbeat", you are specifying a requirement on your
 | samples. Remember that a long "heartbeat" means that you are happy for
 | multiple PDPs to be estimated from a single sample, which means the
 | individual PDPs are not really accurate. The nice thing about this though is
 | that these not-quite-accurate PDPs accumulate accurately. The individual
 | PDPs are estimated from the average rate over a longer period, hence when
 | you accumulate these PDPs into a single period, the average rate is correct
 | for that period. So "heartbeat" is a "garbage threshold" for how much
 | inaccuracy you can tolerate in your "fine grain", small "steps", RRAs.
 |
 | Note that the xff for your RRA is 0. This has no effect since steps=1 for
 | this RRA, and as I remember it xff only comes into effect when accumulating
 | multiple PDP's into an RRA.
 |
 | > Now if I do the measure for 20 a bit early or a bit late, I
 | > would expect
 | > this PDP to have an unknown value since the interval for that
 | > pdp exceeded
 | > the heartbeat.
 | > If it is late, I am getting the expected result :
 | >
 | > rrdtool update test.rrd 997147700:0 997147710:10 997147721:21
 | > 997147730:30
 | > 997147740:40
 | >
 | > rrdtool fetch test.rrd AVERAGE --start 997147710 --end 997147740 :
 | > tik
 | >
 | > 997147710: 1.0000000000e+00
 | > 997147720: nan
 | > 997147730: 1.0000000000e+00
 | > 997147740: 1.0000000000e+00
 | [...]
 |
 | This is fine. The PDP for 997147711 -> 997147720 includes no known values,
 | and is hence unknown. The PDP for 997147721 -> 997147730 includes 1sec <
 | heartbeat unknown, and hence the PDP is known.
 |
 | > rrdtool update test.rrd 997147700:0 997147710:10 997147719:19
 | > 997147730:30
 | > 997147740:40
 | >
 | > rrdtool fetch test.rrd AVERAGE --start 997147710 --end 997147740 :
 | > tik
 | >
 | > 997147710: 1.0000000000e+00
 | > 997147720: nan
 | > 997147730: nan
 | > 997147740: 1.0000000000e+00
 | [...]
 |
 | This looks wrong. You may have tripped up a bug in RRD. From my
 | understanding the last time I looked at RRD, the 997147720: output should
 | not be nan since the period 997147711->997147720 has only 1sec unknown, and
 | since 1sec is less than heartbeat, that PDP should be OK.
 |
 | > rrdtool update test.rrd 997147700:0 997147710:10 997147719:19
 | > 997147729:29
 | > 997147740:40
 | >
 | > rrdtool fetch test.rrd AVERAGE --start 997147710 --end 997147740 :
 | > tik
 | >
 | > 997147710: 1.0000000000e+00
 | > 997147720: 1.0000000000e+00
 | > 997147730: nan
 | > 997147740: nan
 |
 | This looks wrong too. 997147711->997147720 has 1sec unknown, hence PDP OK.
 | 997147721->997147730 has only 1sec unknown too, so should be known. For
 | 997147731->997147740 is all unknown so unknown is correct.
 |
 | > On the other hand, I can stretch up to 18 seconds some
 | > readings without
 | > affecting anything :
 | >
 | > rrdtool update test.rrd 997147700:0 997147710:10 997147711:11
 | > 997147729:29
 | > 997147730:30 997147740:40
 | >
 | > rrdtool fetch test.rrd AVERAGE --start 997147710 --end 997147740 :
 | > tik
 | >
 | > 997147710: 1.0000000000e+00
 | > 997147720: 1.0000000000e+00
 | > 997147730: 1.0000000000e+00
 | > 997147740: 1.0000000000e+00
 |
 | Surprisingly, this is actually correct. 997147711->997147720 has 9sec's
 | unknown < step so known. 997147721->997147730 also has 9sec's unknown < step
 | so known. The large unknown period between 997147712->997147729 still leaves
 | enough known values in the PDP's on each side for them both to be known.
 |
 | I've Cc'd this to the rrd-users list in case someone else can comment on the
 | presence/absence of a bug. Note that you are floating in the areas of a
 | possible "off by one" bug, and I recall seeing that one of these was fixed
 | at some point. What version of rrd are you running?
 |
 | ABO
 |
 | --
 | Unsubscribe mailto:rrd-users-request at list.ee.ethz.ch?subject=unsubscribe
 | Help        mailto:rrd-users-request at list.ee.ethz.ch?subject=help
 | Archive     http://www.ee.ethz.ch/~slist/rrd-users
 | WebAdmin    http://www.ee.ethz.ch/~slist/lsg2.cgi
 |
 |

-- 
 ______    __   _
/_  __/_  / /  (_) Oetiker, ETZ J97, ETH, 8092 Zurich, Switzerland
 / // _ \/ _ \/ / phoneto:+41(0)1-632-5286  faxto:+41(0)1-632-1517
/_/ \.__/_.__/_/ mailto:oetiker at ee.ethz.ch http://people.ee.ethz.ch/~oetiker


--
Unsubscribe mailto:rrd-users-request at list.ee.ethz.ch?subject=unsubscribe
Help        mailto:rrd-users-request at list.ee.ethz.ch?subject=help
Archive     http://www.ee.ethz.ch/~slist/rrd-users
WebAdmin    http://www.ee.ethz.ch/~slist/lsg2.cgi



More information about the rrd-users mailing list