[rrd-users] Re: Problem on update

Alex van den Bogaerdt alex at ergens.op.HET.NET
Sat Jul 13 02:46:56 MEST 2002


David Lovy wrote:

> Thanks Alex...  I'm not suggesting that RRDtool knows about points in time, however if I've read the database correctly, it defines the intervals by points in time...  i.e. from a rrdtool dump:
> 
> <!-- 2002-02-18 19:00:00 EST / 1014076800 --> <row><v> NaN </v><v> NaN </v></row>  
> 
> Note the exact date ("28 February 2002") and time (19:00:00 EST) this row this value pertains to...  No information about how long this interval is.  All you know is that it ended at this time.  Actually, the exact time the point was entered is lost since RRDtool moves it to the closest point defining the end of an interval.

The *rate* is valid throughout the interval defined by:

- the date and time specified as a remark
- the number of PDPs in this CDP  (pdp_per_row inside the RRA)
- the number of seconds in each PDP  (step inside the RRD)

This specifies an interval of "step" times "pdp_per_row" seconds
ending at "date and time".

And yes, the exact time is not recorded.  This is because RRDtool, by
design, computes a rate from what you input, normalizes it and uses it
to compute a normalized rate during a normalized interval.  The polling
interval and the storage interval do not need to match.  One example of
that is my way of monitoring modem lines.  In that case, "step" is just
one second, pdp_per_row (aka "steps") is 300 and the actual "polling"
interval varies.  I update at least every 300 seconds (heartbeat set
to 400) but am able to specify the moment exactly up to the second and
will, if necessary, update every second.

> > Together with heartbeat the interval is well defined.  If you don't want
> > the previous interval to be defined, set it to the value that indicates
> > this (NaN aka Unknown).
> 
> right, however the previous value is valid on or about the end of the previous interval, but has no impact or significance to the value measured on or around the end of this interval.  See my speedometer explanation previously.  I agree with everything you and Tobi have preached as it pertains to counter based rras and any rras with more than one dpd_per_row, but I haven't heard any valid explanations for how this method relates to gauge based rras (i.e. 1 gauge measured dpd_per_row).

I gave an example of how I *need* to set multiple CDPs with one update.
Your suggestion would break that.  I'm telling RRDtool that the rate until
"now" (or whatever time I specify) is "1" (or 0, or whatever).  I'm not
telling RRDtool that the current value at this time is 1.  Perhaps the
confusion is in the name "GAUGE".  A similar confusion can happen with
the CF "ABSOLUTE" which does not store the absolute value in a mathematical
sense.

> I like this method for reliable event based metrics, but many of the things we measure are not event based or reliable.  BTW, If I wanted to use this event thing for 2000 modems, I would create one DS per modem...  I would set the heartbeat very large (i.e. some modems stay up for days).  Then to graph it, I would sum up the DSs right?  My main concern...  What if I lose an event?  Or even several events?  How would I know?

I use a heartbeat timer (note: not RRDtool's heartbeat) to update the
correct status every x seconds.  I input the current state of the line
(1 for up, 0 for down).  When I receive a trap, I update the previous
state (so: also 1 for up, 0 for down).  In case of a missed trap, the
system catches up.  If an event has disappeared, no problem.  If many
events disappear, something else is wrong.  The heartbeat time (this
time: RRDtool's heartbeat) does not have to be much larger than the
amount of time inbetween updates (the other heartbeat timer, in the
front end).  The hardest part of this process is to not update more
than once in any given second.  Obviously traps should be preferred
over heartbeat updates.

Also, I don't like to have several DSes per RRD for a per-device way
of storing information.  Multiple DSes can be used to store information
about the same entity (ifOctetsIn+ifOctetsOut for instance).  What if
the amount of modems in your terminal server changes?  I'd rather
generate 30 new RRDs when I activate another modem pool in stead of
having to modify an existing RRD to accomodate those new modems.

Adding up multiple DSes from many RRDs is as simple as adding up
multiple DSes from the same RRD.

People who claim that using many RRDs in stead of one is more
resource intensive are right, however that's no issue for me.  If
it would become an issue, I either have to buy faster hardware or
I have to rethink my strategy.

> > Use the tool for what it is designed and it makes your live easy,
> > use the tool the wrong way and it makes your live difficult.
> 
> Actually, MRTG and RRDtool are very easy the way I use them...  It's all the explaining I have to do to justify the goofy values that's difficult. ;-)  I'm just recommending a simple option that would allow RRDTool to work cleanly for both of our scenarios.  If you like, I can try to drum up some real guage RRDs and some modified with the data I want to see to show you how the numbers should be.

For modems in use the way you want to use it, you could use the MIN, MAX
or LAST CFs.  That way you won't show that the values you're presenting
are goofy ones.  They would still be goofy values though.

What happens if you fail to see a call?
Assume a modem bank with 30 modems:
   00:00:00  # of modems in use = 20
   00:00:01  10 people dial in
   00:01:00  another attemt is made and fails
   00:02:00  dito
   00:04:00  the 10 people disconnect
   00:04:30  another 10 disconnect
   00:05:00  # of modems in use = 10

You query @ 00:00 and get "20".  @ 00:05 you get "10".  You feed
RRDtool with the numbers 20 and 10.

Visible on the image (or report):
   between 23:55 and 00:00  20 modems in use
   between 00:00 and 00:05  10 modems in use

Now *that* is hard to explain.  *Why* was it not possible to dial in
at 00:01:00 and 00:02:00 ?

This isn't an RRDtool problem, the logic behind the monitoring setup
is flawed.

Then you have the normalizing-numbers problem.  If you update at
00:00:00 and at 00:05:05, you're defining an interval of 305 seconds.
The number of modems in use at 00:05:05 contributes to the interval
between 00:00 and 00:05 but also to the interval between 00:05 and
00:10.  If the number of modems at 00:05 is 1 and the number of modems
at 00:10 is 0, the interval between 00:05 and 00:10 will be computed
as follows:  5 seconds out of 300 seconds the number of modems was 1,
295 seconds out of 300 seconds the number of modems was 0.  What you're
asking is essentially to forget about the 5 seconds at the start of
this last interval and set the value for the entire interval to 0.
OK, fine.  Now what happens when you (or: everybody else) have another
RRA with twice the number of PDPs per CDP ?  RRDtool will consolidate
two intervals (for instance: 00:00-00:05 and 00:05-00:10) into one
(which would then be 00:00-00:10).  We have 1 modem during 00:00-00:05
and 0 modems during 00:05-00:10.  What is 00:00-00:10 going to be set
to?  Using an AVERAGE CF it *has*to*become* 0.5

You're getting fractional numbers anyway so why not compute the correct
rate for 00:05-00:10 ?

Again, using another CF will overcome this problem.  I suggest using
LAST or MAX.  LAST was introduced to get rid of fractions but MAX
would probably be better in this case.  You want to see "1" and not "0"
as the "correct" number of modems in use during 00:00-00:10.
-- 
   __________________________________________________________________
 / alex at slot.hollandcasino.nl                  alex at ergens.op.het.net \
| work                                                         private |
| My employer is capable of speaking therefore I speak only for myself |
+----------------------------------------------------------------------+
| Technical questions sent directly to me will be nuked. Use the list. | 
+----------------------------------------------------------------------+
| http://faq.mrtg.org/                                                 |
| http://rrdtool.eu.org  --> tutorial                                  |
+----------------------------------------------------------------------+

--
Unsubscribe mailto:rrd-users-request at list.ee.ethz.ch?subject=unsubscribe
Help        mailto:rrd-users-request at list.ee.ethz.ch?subject=help
Archive     http://www.ee.ethz.ch/~slist/rrd-users
WebAdmin    http://www.ee.ethz.ch/~slist/lsg2.cgi



More information about the rrd-users mailing list