[rrd-users] Percentile off-by-one
Tobias Oetiker
tobi at oetiker.ch
Wed Aug 19 09:48:06 CEST 2009
Hoi Joshua,
Thursday Joshua Keroes wrote:
> How is percentile calculated? We've been calculating this manually, since
> before rrdtool offered a PERCENT function. We discovered that our manual
> calculation and the one supplied by rrdtool differ by one index. For example
> (percentile.pl program attached):
>
> [Thu Aug 13, 19:58:41 | 692] $ percentile.pl -v --cf=AVERAGE --ds=ingress
> --dsi=0 --start=1242864000 --end=1245542399 etkfgl624456ubielg01g.rrd
>
> percentile.pl using RRDs-1.2019
> etkfgl624456ubielg01g.rrd:
> main::ptile_fetch()
> start = 05/21/09
> end = 06/20/09
> rows = 31
> 95% of 31 = 29.45
> 95th %-ile row index = 29
> discarded rows of -NaN's = 0
> row 28 = *1126767.40*
> row 29 = *1521699.32*
> row 30 = 4132277.36
> manual calculation using fetch() = *1521699.32*
> PERCENT calculation using graph() = *1126767.40*
> difference ~ 394931 (26%)
>
>
> Our manual calculation is grabbing row 29 and PERCENT is grabbing row 28.
hmmm the heart of the problem is, that you have too small a sample
set ... if the difference between picken one row or the next is 26%
you should NOT be using percentile since the results will be more
or less random ...
the 'test' is pretty simple, lets say you have an array with 5
elements.
PERCENT 0 should thus pick the first element and PERCENT 100 the
last one, right ?
(5-1)* 0/100 = 0 OK
(5-1)*100/100 = 4 OK
qed.
in any event ... I guess we should be rounding maybe the line
should read
field = round((steps-1) * dst->vf.param / 100);
in your case it does not quite make a difference, but almost ...
cheers
tobi
> >From rrdgraph.c:
>
> case VDEF_PERCENT:{
> rrd_value_t *array;
> int field;
>
> if ((array = malloc(steps * sizeof(double))) == NULL) {
> rrd_set_error("malloc VDEV_PERCENT");
> return -1;
> }
>
> for (step = 0; step < steps; step++) {
> array[step] = data[step * src->ds_cnt];
> }
> qsort(array, step, sizeof(double), vdef_percent_compar);
>
> * field = (steps - 1) * dst->vf.param / 100; /* <======= array index */*
> dst->vf.val = array[field];
> dst->vf.when = 0; /* no time component */
> free(array);
> }
>
> Should the noted line perhaps read as follows?
>
> field = steps * dst->vf.param / 100;
>
> If not, what am I missing?
>
> Many thanks,
> Joshua
>
--
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch tobi at oetiker.ch ++41 62 775 9902 / sb: -9900
More information about the rrd-users
mailing list