# [rrd-users] Re: percentile calculations

Alex van den Bogaerdt alex at ergens.op.het.net
Wed Sep 6 14:38:39 MEST 2006

```On Wed, Sep 06, 2006 at 09:16:14AM +0200, Ralf Kruedewagen wrote:
> Hi,
>
> just a comment about how I get the percentiles from my RRDtool data:
>
> Cacti (a RRDtool GUI) has a variable "Nth Percentile", see
>
> Cacti does not use the built-in RRDtool function. UNKOWN values are ignored
> and the result is as you would expect. Nth can be any number between 1 and
> 99.

For scenarios where the unknowns are a result of improper time series,
like the scenario of Dan Cech, this works.

> > As a very simplified example, say I'm 10 days into a month (with 20 days
> > remaining) and the values so far look like this:
> >
> > 1,2,3,4,5,6,7,8,9,10
> >
> > The 90th percentile should be 9, but in fact it will be 7 because
> > PERCENT will use the 20 NaN values in the calculation.

The series wanted is 1,2,3,4,5,6,7,8,9,10 but the series queried is
1,2,3,4,5,6,7,8,9,10,U,U,U,U,U,U,U,U,U,U,U,U,U,U,U,U,U,U,U,U

The solution for Dan is simple.  He wants to look at 10 days, and
when he does, he gets the number he expects.
Indeed, ignoring the unknowns results in the same.

However, suppose the unknowns are because of problems at the ISP...

The same 30 days in a month, the same known values, but not the
result of being at 10 days in the month.  Mix those unknown values
anywhere in the series, their place doesn't matter.  After sorting,
the series becomes:
U,U,U,U,U,U,U,U,U,U,U,U,U,U,U,U,U,U,U,U,1,2,3,4,5,6,7,8,9,10

Now, using the method of ignoring unknowns, I have to pay for 9
{whatevers}, in stead of 7 {whatevers}.

Suppose each unknown value is assumed to be the average of each known:

average{1,2,3,4,5,6,7,8,9,10} is 5.5
1,2,3,4,5, {20 times 5.5} ,6,7,8,9,10

90th percentile is still at 7.  So, assuming average traffic during
those unknown intervals (monitoring/billing didn't work) still results
in 7, not 9.

What number do I have to substitute to get 9 ?

1,2,3,4,5,6,7,8, { unknowns } ,9,10
1,2,3,4,5,6,7,8,9, { unknowns } ,10

In both cases, the 27th value is one of the unknowns.  If this has
to result in 9, the unknowns are 9.

In other words: the billing process at the provider doesn't work
properly.  They get unknowns.  To hide this problem, they effectively
say I had a high bandwidth utilization during those unknown intervals.

Is this fair?

--
Alex van den Bogaerdt
http://www.vandenbogaerdt.nl/rrdtool/

--
Unsubscribe mailto:rrd-users-request at list.ee.ethz.ch?subject=unsubscribe
Help        mailto:rrd-users-request at list.ee.ethz.ch?subject=help
Archive     http://lists.ee.ethz.ch/rrd-users