[rrd-users] Re: Question on 95th percentile
Alex van den Bogaerdt
alex at slot.hollandcasino.nl
Tue Feb 27 00:09:28 MET 2001
Clifton Royston wrote:
> > - 95% * 1234 == 1172.30
> > Do I take element 1171 or 1172 in this case? (1172-1, or 1173-1,
> > as the array is zero based)
>
> I believe there are statistical formulae for this; it seems to me I
> recall for maximum accuracy you interpolate between the two samples
> based on the fraction of the way through that it falls, so you would
> take v(1171) + 0.30*(v(1172)-v(1171)). (Similar to what you do when
> calculating the median of an even number of samples, you take halfway
> between the middle two values.)
That seems reasonable and can be done. So can both of your next
suggestions (skipped, about either discarding the NaNs or move them
to the bottom of the array).
Let me rephrase the question. If the 95th percentile was going to be
in a next release of RRDtool, we need to make sure that it covers our
needs. So far I have:
1) Make it configurable. If you want the 90th percentile, or the 99th,
whatever, it will work. It'll even work if you ask for the 99.98th
percentile
2) Sort the data into an array, keeping every NaN in the dataset but
below everything else (not as zero, it is below negative numbers)
3) Calculate which slot in the array is at or below the 95th percentile
4) return the data rate in that slot
I did so for the following reasons:
1) Not everybody wants to know the 95th
2) The 95th is to show how well you perform or how much the other
should pay. In both cases, keeping the NaN numbers lower the
outcome so you performed poor (NaNs are your fault) or you get
less money (NaNs are your fault) [1]
3+
4) The fraction calculated is not necessarily valid for the data rates.
Perhaps the next slot in the array is the double of the one in the
calculated slot. IMHO you can't assume it is alright to take 0.30
times the difference. Again, I round down so the result won't be
higher than the real number. It is arguable that one should use the
next slot of the array. If so, please answer so to the next question
The question is: if it would be implemented like this (don't hold
your breath!) would it be as it should be?
cheers,
Alex
[1] Consider an exam. You need to score 70% or better. If there
are 9 questions, you need to answer 6.3 questions correct. You
can't so you really need to answer 7 questions correct. This
obviously asks for rounding up.
However, if you need to answer at most 30% in error, you can
only answer 2.7 questions wrong. In this case we should round
down to 2. We've discussed the 70th percentile and the 30th.
--
Unsubscribe mailto:rrd-users-request at list.ee.ethz.ch?subject=unsubscribe
Help mailto:rrd-users-request at list.ee.ethz.ch?subject=help
Archive http://www.ee.ethz.ch/~slist/rrd-users
WebAdmin http://www.ee.ethz.ch/~slist/lsg2.cgi
More information about the rrd-users
mailing list