[rrd-users] Percentile off-by-one

Tobias Oetiker tobi at oetiker.ch
Wed Aug 19 09:48:06 CEST 2009


Hoi Joshua,

Thursday Joshua Keroes wrote:

> How is percentile calculated? We've been calculating this manually, since
> before rrdtool offered a PERCENT function. We discovered that our manual
> calculation and the one supplied by rrdtool differ by one index. For example
> (percentile.pl program attached):
>
> [Thu Aug 13, 19:58:41 | 692] $ percentile.pl -v --cf=AVERAGE --ds=ingress
> --dsi=0 --start=1242864000 --end=1245542399 etkfgl624456ubielg01g.rrd
>
> percentile.pl using RRDs-1.2019
> etkfgl624456ubielg01g.rrd:
>         main::ptile_fetch()
>            start = 05/21/09
>            end = 06/20/09
>            rows = 31
>            95% of 31 = 29.45
>            95th %-ile row index = 29
>            discarded rows of -NaN's = 0
>         row 28 = *1126767.40*
>         row 29 = *1521699.32*
>         row 30 = 4132277.36
>    manual  calculation using fetch() = *1521699.32*
>    PERCENT calculation using graph() = *1126767.40*
>    difference ~ 394931 (26%)
>
>
> Our manual calculation is grabbing row 29 and PERCENT is grabbing row 28.

hmmm the heart of the problem is, that you have too small a sample
set ... if the difference between picken one row or the next is 26%
you should NOT be using percentile since the results will be more
or less random ...


the 'test' is pretty simple, lets say you have an array with 5
elements.

PERCENT 0 should thus pick the first element and PERCENT 100 the
last one, right ?

    (5-1)*  0/100 = 0   OK
    (5-1)*100/100 = 4   OK

qed.

in any event ... I guess we should be rounding maybe the line
should read

field = round((steps-1) * dst->vf.param / 100);

in your case it does not quite make a difference, but almost ...

cheers
tobi

> >From rrdgraph.c:
>
> case VDEF_PERCENT:{
>     rrd_value_t *array;
>     int       field;
>
>     if ((array = malloc(steps * sizeof(double))) == NULL) {
>         rrd_set_error("malloc VDEV_PERCENT");
>         return -1;
>     }
>
>     for (step = 0; step < steps; step++) {
>         array[step] = data[step * src->ds_cnt];
>     }
>     qsort(array, step, sizeof(double), vdef_percent_compar);
>
> *    field = (steps - 1) * dst->vf.param / 100; /* <======= array index */*
>     dst->vf.val = array[field];
>     dst->vf.when = 0;   /* no time component */
>     free(array);
> }
>
> Should the noted line perhaps read as follows?
>
>   field = steps * dst->vf.param / 100;
>
> If not, what am I missing?
>
> Many thanks,
> Joshua
>

-- 
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch tobi at oetiker.ch ++41 62 775 9902 / sb: -9900



More information about the rrd-users mailing list