[rrd-users] Avoiding pitfalls when using GAUGE with some kind of data

Cristian Ghezzi rrd-users01 at ghezzi.net
Sat Aug 21 23:45:21 MEST 2004


In these examples I show that you must be very careful when using 
RRDtool if you don't want to come up with wrong data. In particular, 
when using GAUGE, you should be careful to insert ONE value per step and 
do it exactly ON STEP BOUNDARIES.

* GAUGE EXAMPLE #1: "many updates per step are wrong"

Suppose you want to plot the number of people that get off the bus at 
some bus stop.
You start inserting values in RRDtool at time 1093110000.
You define a step of 10 minutes, but you don't wait 10 minutes before 
storing new data: you just store it whenever you have it, because you 
might forget it.

$ rrdtool create test.rrd --start 1093110000 --step 600 
DS:people:GAUGE:1200:U:U RRA:AVERAGE:0.5:1:6

After 600 seconds you see 10 people getting off the bus:

$ rrdtool update test.rrd 1093110600:10

After 750 seconds from the start, you see 20 people:

$ rrdtool update test.rrd 1093110750:20

After 1200 seconds from the start, you count 30 more people:

$ rrdtool update test.rrd 1093111200:30

The tool should have now enough data for 2 PDPs, one at 600 and another 
at 1200.
The first PDP must be exactly 10, and the second PDP must be the average 
of 20 and 30, which is (20+30)/2=25.

rrdtool fetch test.rrd AVERAGE --start 1093110000 --end 1093112400
                         people

1093110000: -1.#IND000000e+000
1093110600: 1.0000000000e+001
1093111200: 2.7500000000e+001
1093111800: -1.#IND000000e+000
1093112400: -1.#IND000000e+000

Oh oh, the second PDP is 27.5 not 25!
If you know RRDtool, you also know why: PDPs are a "weighted average" of 
all the values inserted in the step, the weight being the elapsed time 
from the last step or from the last insertion, whichever smaller.
The value 20 was inserted 150 seconds after the step.
The value 30 was inserted 450 seconds after the previous update.
The weighted average is (20*150 + 30*450)/600 = 27.5

So RRDtool thinks that 2.5 more people got off the bus in the step from 
1093110600 to 1093111200


* GAUGE EXAMPLE #2: "one update per step still wrong when not aligned"

You are still counting people getting off the bus, but this time you 
wait the full 600 seconds before each update to the database, so that 
PDPs will not be computed from the weighted average of two or more numbers.

$ rrdtool create test.rrd --start 1093110000 --step 600 
DS:people:GAUGE:1200:U:U RRA:AVERAGE:0.5:1:6

After 600 seconds you see 10 people getting off the bus, and you insert 
this value in the database:

$ rrdtool update test.rrd 1093110600:10

After 750 seconds from the start, you see 20 people, and remember the value.
After 1200 seconds from the start, you count 30 more people, which added 
to the previous value gives 50.
You are a bit slow at doing calculations, so you insert the value 100 
seconds after the step boundary, which was 1093111200.

$ rrdtool update test.rrd 1093111300:50

You check the database, and everything seems to work perfectly this time:

$ rrdtool fetch test.rrd AVERAGE --start 1093110000 --end 1093112400
                         people

1093110000: -1.#IND000000e+000
1093110600: 1.0000000000e+001
1093111200: 5.0000000000e+001
1093111800: -1.#IND000000e+000
1093112400: -1.#IND000000e+000

During the next step you count 20 people from one bus, and because you 
don't have to make any calculation this time, you manage to store the 
value on the step boundary, which is1093111800 :

$ rrdtool update test.rrd 1093111800:20

Now you should be able to find these three values in the database: 10, 
50, 20:

$ rrdtool fetch test.rrd AVERAGE --start 1093110000 --end 1093112400
                         people

1093110000: -1.#IND000000e+000
1093110600: 1.0000000000e+001
1093111200: 5.0000000000e+001
1093111800: 2.5000000000e+001
1093112400: -1.#IND000000e+000

Oh no! Even if you supplied the last value exactly on time, what you see 
is wrong! It shows 25 instead of 20!
How can it be possible?
It is for the same reason as before: values are weighted, so for the 
last step you have 50 for 100 seconds (remember it was inserted with a 
100 seconds delay) and 20 for the remaining 500 seconds, which gives 
(50*100 + 20*500)/600=25. RRDtool has added 5 ghosts.


* CONCLUSIONS

This is not a bug in RRDTool, because if you were plotting some 
different kind of data, for example room temperature, the results would 
be conceptually correct.

I suggest the introduction of a new flag to disable "weighting" when 
averaging values for the step.

While I'm at it, may I also suggest the introduction of a flag to 
replace averaging with a min/max function, the necessity of which being 
already examined in a past email ("MAX doesn't work as expected: peaks 
are filtered").




Thank you


Cristian Ghezzi

--
Unsubscribe mailto:rrd-users-request at list.ee.ethz.ch?subject=unsubscribe
Help        mailto:rrd-users-request at list.ee.ethz.ch?subject=help
Archive     http://www.ee.ethz.ch/~slist/rrd-users
WebAdmin    http://www.ee.ethz.ch/~slist/lsg2.cgi



More information about the rrd-users mailing list