[rrd-users] Simple questions about RRD

Mon Dec 7 20:46:57 CET 2009

Jean-Yves Avenard wrote:

>It shows the maximum as being 2972W ; which is indeed the maximum with
>the actual data at the highest resolution.
>But it's definitely not the maximum of 5 minutes average as the graph
>shows: nothing is over 2200W
>
>That graph is created with :
>
>$ret = exec("$RRDTOOL graph $name -l 0 \
>-t '$title' \
>-x $legend \
>--step $res --start e-$start --end $timestamp \
>-w $width -h $height \
>DEF:total=currentcost.rrd:total:AVERAGE DEF:ch2=currentcost.rrd:ch2:AVERAGE \
>DEF:solar=solarprod.rrd:total:AVERAGE DEF:ch1=currentcost.rrd:ch1:AVERAGE \
>DEF:totalmin=currentcost.rrd:total:MIN:reduce=AVERAGE
>DEF:ch2min=currentcost.rrd:ch2:MIN:reduce=AVERAGE \
>DEF:solarmin=solarprod.rrd:total:MIN:reduce=AVERAGE
>DEF:ch1min=currentcost.rrd:ch1:MIN:reduce=AVERAGE \
>DEF:totalmax=currentcost.rrd:total:MAX:reduce=AVERAGE
>DEF:ch2max=currentcost.rrd:ch2:MAX:reduce=AVERAGE \
>DEF:solarmax=solarprod.rrd:total:MAX:reduce=AVERAGE
>DEF:ch1max=currentcost.rrd:ch1:MAX:reduce=AVERAGE \

Why are you using reduce=AVERAGE on a maximum ?

Looking back at your first post, it looks like your DS is at 60s 
intervals, therefore a graphing interval of 300s will require further 
consolidation. The data in the graphs is quite peaky, so this is 
likely to have a significant effect - eg at around time 21.5 there is 
a quite a narrow peak and it's quite likely that the average of 5 max 
values is considerably less than the max of 5 values. For example, it 
could be 4 samples of 0.6k and one of 5.6k, or it could be 5 samples 
of 1.6k - quite a difference.

Having said that, I can't see where the difference is coming from, 
unless the VDEF is using the raw data before it's been consolidated - 
someone would have to look at the source to see if that's possible.

I will say that it can make things a lot easier to see if you create 
a file of test data - carefully chosen to exercise the functionality 
in doubt. So perhaps something like this :

1260214800:1
1260214860:1
1260214920:6
1260214980:1
1260215040:1
1260215100:1

Now that should produce the following values when consolidated over 
the 5 minutes from 1260214800 to 1260215100 :
min = 1
ave = 2
max = 6

>Now I have an extra question on how the RRA/consolidation works...
>
>Say I store 5 years of 5 minutes average ; is there any points of also
>storing 5 years of 30 minutes average, 5 years of 2 hours average etc?
>
>I would have assumed that it wouldn't matter ; except when retrieving
>the 30 minutes/2 hours average extra processing is required as it
>needs to retrieve more values.
>
>Now; if I use reduce=AVERAGE for all my graph like
>DEF:total=currentcost.rrd:total:AVERAGE DEF:ch2=currentcost.rrd:ch2:AVERAGE \
>DEF:solar=solarprod.rrd:total:AVERAGE DEF:ch1=currentcost.rrd:ch1:AVERAGE \
>DEF:totalmin=currentcost.rrd:total:MIN:reduce=AVERAGE
>DEF:ch2min=currentcost.rrd:ch2:MIN:reduce=AVERAGE \
>DEF:solarmin=solarprod.rrd:total:MIN:reduce=AVERAGE
>DEF:ch1min=currentcost.rrd:ch1:MIN:reduce=AVERAGE \
>DEF:totalmax=currentcost.rrd:total:MAX:reduce=AVERAGE
>DEF:ch2max=currentcost.rrd:ch2:MAX:reduce=AVERAGE \
>DEF:solarmax=solarprod.rrd:total:MAX:reduce=AVERAGE
>DEF:ch1max=currentcost.rrd:ch1:MAX:reduce=AVERAGE \
>
>Do I even need to create the RRD with MIN and MAX ? or it can be all
>done from the AVERAGE RRD database ?
>In my case, I create the RRD with:
>100 day of 1 minute average
>5 years of 5 minutes average
>
>rrdtool create currentcost.rrd -s 60 \
>DS:total:GAUGE:300:0:U \
>DS:ch1:GAUGE:300:0:U \
>DS:ch2:GAUGE:300:0:U \
>DS:ch3:GAUGE:300:0:U \
>RRA:AVERAGE:0.5:1:144000 \
>RRA:AVERAGE:0.5:5:525600 \
>RRA:MIN:0.5:1:144000 \
>RRA:MIN:0.5:5:525600 \
>RRA:MAX:0.5:1:144000 \
>RRA:MAX:0.5:5:525600
>
>Not creating a MIN and MAX one would reduce considerably the size rrd file.

You seem to have some very woolly thinking there. If you only store 
average, then you cannot possibly derive min and max from 
consolidated data. Eg, suppose someone boils the kettle and it takes 
3kW for one minute (and it happens to coincide nicely with one sample 
period). You could have a 5 minute period with values of 3,0,0,0,0 
which average to 0.6. There is no way to know the minimum other than 
it's <= 0.6, and no way to know the max other than it's >= 0.6. In 
this case they are 0 and 3, but they could just as well be 0.6 if it 
was a steady load.

Extend that to 30 minute and 2 hour consolidations and the difference 
widens - eg you boil the kettle but otherwise use no power and the 
max is still 3, the min is still 0, but the average is now only 0.1 
and 0.025 respectively.

Now, if you are only interested in 5 minute smoothed data and NOT 1 
minute min and max, then you are correct that you don't need to keep 
separate min and max for a ds - provided you do not keep or try to 
graph data from any consolidation of that.

There are two main reasons people use consolidated storage :

1) To reduce storage requirements. Most people aren't bothered by the 
fine detail once it's moderately aged. So for example, at work I keep 
5 minute samples for only 2 days, I keep 1/2 hour consolidations for 
longer, 2 hour consolidations for longer still, and 1 day 
consolidations for 2 years. So I can graph in detail, or over a long 
time, but not both - I can't plot a details graph for data a year old 
and that's fine by us.

2) Reduce processing to generate graphs. Naturally, if you plot a 
year long graph with 5 minute samples then the graphing program has 
to read in and consolidate a lot of data - and that takes time and 
memory. A while ago I realised I'd made a mistake and was storing 12 
hour data rather than 24 hour data - and graphing at 24 hour 
resolution. I found that re-working things significantly decreased 
runtime and memory requirements - particularly on the complex graphs 
with hundreds of data sources in them.

-- 
Simon Hobson

Visit http://www.magpiesnestpublishing.co.uk/ for books by acclaimed
author Gladys Hobson. Novels - poetry - short stories - ideal as
Christmas stocking fillers. Some available as e-books.