[rrd-users] Re: Question on 95th percentile

Alex van den Bogaerdt alex at slot.hollandcasino.nl
Tue Feb 27 00:09:28 MET 2001


Clifton Royston wrote:

> > - 95% * 1234 == 1172.30
> >   Do I take element 1171 or 1172 in this case? (1172-1, or 1173-1,
> >   as the array is zero based)
> 
> I believe there are statistical formulae for this; it seems to me I
> recall for maximum accuracy you interpolate between the two samples
> based on the fraction of the way through that it falls, so you would
> take v(1171) + 0.30*(v(1172)-v(1171)).  (Similar to what you do when
> calculating the median of an even number of samples, you take halfway
> between the middle two values.)

That seems reasonable and can be done. So can both of your next
suggestions (skipped, about either discarding the NaNs or move them
to the bottom of the array).


Let me rephrase the question.  If the 95th percentile was going to be
in a next release of RRDtool, we need to make sure that it covers our
needs.  So far I have:

1) Make it configurable.  If you want the 90th percentile, or the 99th,
   whatever, it will work.  It'll even work if you ask for the 99.98th
   percentile
2) Sort the data into an array, keeping every NaN in the dataset but
   below everything else (not as zero, it is below negative numbers)
3) Calculate which slot in the array is at or below the 95th percentile
4) return the data rate in that slot

I did so for the following reasons:
1) Not everybody wants to know the 95th
2) The 95th is to show how well you perform or how much the other
   should pay.  In both cases, keeping the NaN numbers lower the
   outcome so you performed poor (NaNs are your fault) or you get
   less money (NaNs are your fault)  [1]
3+
4) The fraction calculated is not necessarily valid for the data rates.
   Perhaps the next slot in the array is the double of the one in the
   calculated slot.  IMHO you can't assume it is alright to take 0.30
   times the difference.  Again, I round down so the result won't be
   higher than the real number. It is arguable that one should use the
   next slot of the array. If so, please answer so to the next question

The question is: if it would be implemented like this (don't hold
your breath!) would it be as it should be?

cheers,
Alex

[1] Consider an exam.  You need to score 70% or better.  If there
    are 9 questions, you need to answer 6.3 questions correct. You
    can't so you really need to answer 7 questions correct.  This
    obviously asks for rounding up.
    However, if you need to answer at most 30% in error, you can
    only answer 2.7 questions wrong.  In this case we should round
    down to 2.  We've discussed the 70th percentile and the 30th.

--
Unsubscribe mailto:rrd-users-request at list.ee.ethz.ch?subject=unsubscribe
Help        mailto:rrd-users-request at list.ee.ethz.ch?subject=help
Archive     http://www.ee.ethz.ch/~slist/rrd-users
WebAdmin    http://www.ee.ethz.ch/~slist/lsg2.cgi



More information about the rrd-users mailing list