# [rrd-users] Percentile off-by-one

Leander Koornneef l.koornneef at ic-s.nl
Thu Aug 20 12:37:25 CEST 2009

```Hi all,

On Aug 19, 9:48 am, t... at oetiker.ch (Tobias Oetiker) wrote:
> Hoi Joshua,
>
> Thursday Joshua Keroes wrote:
> > How is percentile calculated? We've been calculating this
manually, since
> > before rrdtool offered a PERCENT function. We discovered that our
manual
> > calculation and the one supplied by rrdtool differ by one index.
For example
> > (percentile.pl program attached):
>
> > [Thu Aug 13, 19:58:41 | 692] \$ percentile.pl -v --cf=AVERAGE --
ds=ingress
> > --dsi=0 --start=1242864000 --end=1245542399
etkfgl624456ubielg01g.rrd
>
> > percentile.pl using RRDs-1.2019
> > etkfgl624456ubielg01g.rrd:
> >         main::ptile_fetch()
> >            start = 05/21/09
> >            end = 06/20/09
> >            rows = 31
> >            95% of 31 = 29.45
> >            95th %-ile row index = 29
> >            discarded rows of -NaN's = 0
> >         row 28 = *1126767.40*
> >         row 29 = *1521699.32*
> >         row 30 = 4132277.36
> >    manual  calculation using fetch() = *1521699.32*
> >    PERCENT calculation using graph() = *1126767.40*
> >    difference ~ 394931 (26%)
>
> > Our manual calculation is grabbing row 29 and PERCENT is grabbing
row 28.
>
> hmmm the heart of the problem is, that you have too small a sample
> set ... if the difference between picken one row or the next is 26%
> you should NOT be using percentile since the results will be more
> or less random ...

I'm also experiencing discrepancies in PERCENT calculations as done by
rrdtool
when compared to other tools, like the Statistics::Descriptive perl
module. I've
also tested with Excel and Numbers (from Apple iWork). The difference
between
outcomes appears to increase with larger datasets. Here are the 90th
percentile
results from rrdtool and my own script (which uses
Statistics::Descriptive):

# of days		rrdtool		Statistics::Descriptive
1			2.25066		2.26135
10			1.37383		1.49129
20			1.30663		1.47212
30			1.48260		1.67608

This last dataset (30 days) consists of about 9000 values and in this
dataset,
the value 1.48260 is actually more like the 87th percentile. Both
Excel and Numbers
agree with Statistics::Descriptive here, so I'm inclined to suspect an
error/bug in
rrdtool?

Thanks,
Leander

```