# [rrd-users] Percentile consolidation

Donovan Baarda abo at minkirri.apana.org.au
Mon Oct 26 00:02:29 CET 2015

```Note that variance, and hence stddev, can be calculated incrementally (by
keeping a timeseries of the average rate squared; variance = (average
rate^2 - average^2), stddev=sqrt(variance)), and assuming a normal
distribution, 95th percentile = 2*stddev. The accuracy of this depends on
how closely your samples match a normal distribution and is not as
resilient to outliers as calculating a true 95th percentile from all the
samples, but it's a pretty good approximation. If you know your
distribution is closer to log-normal (which it often is for things like
latency), you can calculate a more accurate 95th percentile from the
average and variance like this;

mu = ln(avg) - ln(var/avg**2 + 1)/2
sigma = sqrt(ln(var/avg**2 + 1))
p95 = lognorminv(0.95, mu, sigma)

Unfortunately right now rrd doesn't support RRA's of type variance
(CF=VAR?) or mean value squared (CF=AVERAGE2?). However, if you were going
to request a feature, this is something that is definitely possible. A true
95th percentile RRA is definitely not. Another ugly approximation uses
bucketed distributions, but I wouldn't request that.

Note having an RRA of type CF=AVERAGE2 is useful for calculating the "root
mean square", something that is also useful for eg AC power calculations.
Also, stddev is actually the "root mean square" of the distance from the
mean.

On 25 October 2015 at 10:43, Steve Shipway <s.shipway at auckland.ac.nz> wrote:

> Percentiles cannot be calculated incrementally; you need the entire
> dataset to deduce them whereas mean, max, min only require the last
> calculation result and possibly the number of samples so far.  Hence you
> cannot have the percentile as a CF
>
> However the RRDTool RPN functions include a percentile calculator, so you
> can still deduce this on the fly as you graph using the available samples.
> You would need to be careful to ensure that the data series over which you
> are aggregating is of maximum granularity though if you want to ensure
> maximum accuracy
>
> Steve
>
> *Steve Shipway*
> University of Auckland ITS
> s.shipway at auckland.ac.nz
> Ph: +64 9 373 7599 ext 86487
>
> ------------------------------
> *From:* rrd-users [rrd-users-bounces+s.shipway=
> auckland.ac.nz at lists.oetiker.ch] on behalf of Pablo Chacin [
> pchacin at sensefields.com]
> *Sent:* Saturday, 24 October 2015 11:43 p.m.
> *To:* rrd-users at lists.oetiker.ch
> *Subject:* [rrd-users] Percentile consolidation
>
> Greetings
>
> Been able to pre-calculate an store certain data percentiles, like media
> an 95 percentile is a common requirement for any metrics database, as these
> aggregation functions are much more stable and representative of data than
> the average or maximun values.
>
> I saw that the mean was recently included as an consolidation function in
> rrdtool, but still there's no possible to calculate other arbitrary
> percentiles. Interestingly, percentiles have been available when retrieving
> data for graphs or reporting.
>
> Is there any compiling reason not to include percentiles as consolidation
> functions? Is there any plan to do so in the future?
>
> Regards
>
>
> ---------------------------
> Pablo Chacin
> CTO
> SenseFields SL
> Tlf (+34) 93 250 45 98
> Gran Via 674, principal 1º
> 08010 Barcelona, Spain
> http://www.sensefields.com
>
>
> This message was directed exclusively at the recipient and contains
> privileged and confidential information. If you receive this message in
> error, I beg to inform us immediately by reply email or by phone 0034 93
> 250 45 98, and proceed to their elimination.
>
> _______________________________________________
> rrd-users mailing list
> rrd-users at lists.oetiker.ch
> https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users
>
>

--
Donovan Baarda <abo at minkirri.apana.org.au>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.oetiker.ch/pipermail/rrd-users/attachments/20151026/3950fa30/attachment.html>
```