[rrd-users] Usual RRD question on AVERGAE, MAX and consolidation :)

Steve Shipway s.shipway at auckland.ac.nz
Tue Dec 31 23:34:44 CET 2013


>If I compare the last 100 samples for example, from both AVERAGE and
>MAX, they are all the same line for line (although I am just showing
>10 below to reduce the size of this email):

This is true, because the RRA being used has 1cdp=1pdp, and so AVG and MAX will yield the same value (max of a set of 1 is the same as the average of a set of 1)

I am assuming here that you have a single very large RRA of 1cdp=1pdp and no other RRAs.  You may have some consolodation RRAs defined which will affect graphing functions.

One thing to note is that (if you have RRDtool 1.4.x) you can define a 95th Percentile set of rules that will perform the 95th percentile calculations for you, removing the need for the external script.

>Using AVERAGE produces lower values on a graph for
>current/average/max/total statistics at the bottom of the graph, and
>the graph is drawn differently showing this. This is the same DS as
>above but using AVERAGE instead of MAX, you will notice the 95th
>percentile value is the same because it is generate by the external
>PHP script as I mentioned: http://i.imgur.com/P27kjfc.png

This is because, when using a graph, there is potentially additional consolodation being performed, and potentially other RRAs may be selected if you have them defined.

When making a graph, it will select the RRA that most closely matches the granularity of the pixels of the graph.  If you only have the one RRA (1cdp=1pdp) then this will have to be used.  Next, it may need to perform additional consolodation, for example, if 1 pixel = 2cdp.  In this case, it will have to average/max this set of pdp as well (which concolodation function is used depends on your DEF declariation).

As a result, you may find that the data set is further averaged after being selected.

This is not a problem for the average summary statistics, but for the MAX you will get MAX(AVG(x)) which will usually be lower than MAX(x).  The way around this is to define a second DS that uses MAX and use it for the MAX statistics, though this can be wasteful as it requires a corresponding MAX RRA to use, which is redundant if your normal RRA is 1cdp=1pdp.

So, this explains why you see lower MAX values than you expect.  I originally had this problem in Routers2 but changed the code to explicitly use the MAX RRA when calculating the MAX stats and the AVG RRA for the AVG and LAST.  At least you don't get the problem of the 95th percentile becoming more inaccurate as the granularity increases, as you are working on 1cdp=1pdp throughout, and the calculation is performed on the unconsolodated data. 

In this case, RRDtool is being a little too 'helpful', by performing the additional consolodation step before calculating the statistics but using an inappropriate consolodation function.  You need to override its selection and force it to use MAX -- under RRDtool 1.4.x you can provide a CF override on the DEF declaration using 'reduce'.

For example:

DEF:x=foo.rrd:ds:AVG
DEF:x2=foo.rrd:ds:AVG:reduce=MAX
VDEF:avgx=x,AVERAGE
VDEF:maxx=x2,MAXIMUM
VDEF:percx=x,95,PERCENTNAN
LINE:x#00ff00:Average value
HRULE:percx#ff0000:95th Percentile
GPRINT:avgx:Average is %.2lf %sbps
GPRINT:maxx:Maximum is %.2lf %sbps

See how the x2 DEF specifies the MAX reduction function, and is used only for creating the maxx VDEF which is only used for the GPRINT.  Everything else uses the x DEF.  I've also used the PERCENTNAN function to calculate the 95th percentile internally, though this may also fall afoul of the consolodation functions resulting in a lower than accurate value - I've not yet tested this.

Have a good Christmas and new year...

Steve

Steve Shipway
University of Auckland ITS
UNIX Systems Design Lead
s.shipway at auckland.ac.nz
Ph: +64 9 373 7599 ext 86487




More information about the rrd-users mailing list