[rrd-users] Re: Bug? was RE: rddtool Heartbeat & Step
Tobias Oetiker
oetiker at ee.ethz.ch
Fri Aug 10 07:35:10 MEST 2001
Yesterday BAARDA, Don wrote:
|
| G'day,
it seems that some code reviewing in this area would be good
atleast to document how it realy works :-)
pushed it to my todo list ...
thanks
tobi
|
| > -----Original Message-----
| > From: Blaise Lepeuple [mailto:blaise at yaga.com]
| > Sent: Tuesday, August 07, 2001 2:10 PM
| > To: don.baarda at baesystems.com
| > Subject: rddtool Heartbeat & Step
| >
| >
| > I'm sorry if you are the wrong person to ask this, but you
| > had your email
| > address on the online man page for "rrd create".
| >
| > If I should talk to somebody else, please redirect me to him.
| [...]
|
| It's been a while since I examined and understood the internals of RRD. I
| did go through it a while ago and satisfied myself that it worked, and since
| then have been content that it does what it is supposed to.
|
| > So here is the creation of the rrd :
| >
| > rrdtool create test.rrd -s 10 --start 997147699 DS:tik:COUNTER:10:0:U
| > RRA:AVERAGE:0:1:10
| [...]
|
| I've just had a look at the man page that now includes my description. It
| includes verbatim my stuff about heartbeat and step, but excluded the bit I
| had on the end of that email about xff. I'll add it here for your info,
| because it will help when you start making RRA's with steps>1;
|
| You are right that "xff" has little affect if you have few "unknown"
| PDPs, and setting "heartbeat" high is one way of reducing the number of
| "unknown" PDPs. However, it is worth remembering that "unknowns" can happen
| because of other reasons. When setting "xff", you are deciding how many
| "unknown" PDPs are acceptable when accumulating into an RRA, and an
| "unknown" really means that rrd has no idea what the rate for the PDP was.
| So "xff" is a "garbage threshold" for how much missing input data you can
| tolerate when accumulating your data into "course grain", large "steps",
| RRAs.
|
| When setting "heartbeat", you are specifying a requirement on your
| samples. Remember that a long "heartbeat" means that you are happy for
| multiple PDPs to be estimated from a single sample, which means the
| individual PDPs are not really accurate. The nice thing about this though is
| that these not-quite-accurate PDPs accumulate accurately. The individual
| PDPs are estimated from the average rate over a longer period, hence when
| you accumulate these PDPs into a single period, the average rate is correct
| for that period. So "heartbeat" is a "garbage threshold" for how much
| inaccuracy you can tolerate in your "fine grain", small "steps", RRAs.
|
| Note that the xff for your RRA is 0. This has no effect since steps=1 for
| this RRA, and as I remember it xff only comes into effect when accumulating
| multiple PDP's into an RRA.
|
| > Now if I do the measure for 20 a bit early or a bit late, I
| > would expect
| > this PDP to have an unknown value since the interval for that
| > pdp exceeded
| > the heartbeat.
| > If it is late, I am getting the expected result :
| >
| > rrdtool update test.rrd 997147700:0 997147710:10 997147721:21
| > 997147730:30
| > 997147740:40
| >
| > rrdtool fetch test.rrd AVERAGE --start 997147710 --end 997147740 :
| > tik
| >
| > 997147710: 1.0000000000e+00
| > 997147720: nan
| > 997147730: 1.0000000000e+00
| > 997147740: 1.0000000000e+00
| [...]
|
| This is fine. The PDP for 997147711 -> 997147720 includes no known values,
| and is hence unknown. The PDP for 997147721 -> 997147730 includes 1sec <
| heartbeat unknown, and hence the PDP is known.
|
| > rrdtool update test.rrd 997147700:0 997147710:10 997147719:19
| > 997147730:30
| > 997147740:40
| >
| > rrdtool fetch test.rrd AVERAGE --start 997147710 --end 997147740 :
| > tik
| >
| > 997147710: 1.0000000000e+00
| > 997147720: nan
| > 997147730: nan
| > 997147740: 1.0000000000e+00
| [...]
|
| This looks wrong. You may have tripped up a bug in RRD. From my
| understanding the last time I looked at RRD, the 997147720: output should
| not be nan since the period 997147711->997147720 has only 1sec unknown, and
| since 1sec is less than heartbeat, that PDP should be OK.
|
| > rrdtool update test.rrd 997147700:0 997147710:10 997147719:19
| > 997147729:29
| > 997147740:40
| >
| > rrdtool fetch test.rrd AVERAGE --start 997147710 --end 997147740 :
| > tik
| >
| > 997147710: 1.0000000000e+00
| > 997147720: 1.0000000000e+00
| > 997147730: nan
| > 997147740: nan
|
| This looks wrong too. 997147711->997147720 has 1sec unknown, hence PDP OK.
| 997147721->997147730 has only 1sec unknown too, so should be known. For
| 997147731->997147740 is all unknown so unknown is correct.
|
| > On the other hand, I can stretch up to 18 seconds some
| > readings without
| > affecting anything :
| >
| > rrdtool update test.rrd 997147700:0 997147710:10 997147711:11
| > 997147729:29
| > 997147730:30 997147740:40
| >
| > rrdtool fetch test.rrd AVERAGE --start 997147710 --end 997147740 :
| > tik
| >
| > 997147710: 1.0000000000e+00
| > 997147720: 1.0000000000e+00
| > 997147730: 1.0000000000e+00
| > 997147740: 1.0000000000e+00
|
| Surprisingly, this is actually correct. 997147711->997147720 has 9sec's
| unknown < step so known. 997147721->997147730 also has 9sec's unknown < step
| so known. The large unknown period between 997147712->997147729 still leaves
| enough known values in the PDP's on each side for them both to be known.
|
| I've Cc'd this to the rrd-users list in case someone else can comment on the
| presence/absence of a bug. Note that you are floating in the areas of a
| possible "off by one" bug, and I recall seeing that one of these was fixed
| at some point. What version of rrd are you running?
|
| ABO
|
| --
| Unsubscribe mailto:rrd-users-request at list.ee.ethz.ch?subject=unsubscribe
| Help mailto:rrd-users-request at list.ee.ethz.ch?subject=help
| Archive http://www.ee.ethz.ch/~slist/rrd-users
| WebAdmin http://www.ee.ethz.ch/~slist/lsg2.cgi
|
|
--
______ __ _
/_ __/_ / / (_) Oetiker, ETZ J97, ETH, 8092 Zurich, Switzerland
/ // _ \/ _ \/ / phoneto:+41(0)1-632-5286 faxto:+41(0)1-632-1517
/_/ \.__/_.__/_/ mailto:oetiker at ee.ethz.ch http://people.ee.ethz.ch/~oetiker
--
Unsubscribe mailto:rrd-users-request at list.ee.ethz.ch?subject=unsubscribe
Help mailto:rrd-users-request at list.ee.ethz.ch?subject=help
Archive http://www.ee.ethz.ch/~slist/rrd-users
WebAdmin http://www.ee.ethz.ch/~slist/lsg2.cgi
More information about the rrd-users
mailing list