[rrd-users] Re: percentile calculations
Simon Hobson
linux at thehobsons.co.uk
Tue Sep 5 20:56:37 MEST 2006
Alex van den Bogaerdt wrote:
> > However, when I'm displaying the graph for the current month, the
>> PERCENT function is using all the unknown future values in the
>> calculation
>
>sure
>
>> causing it to be incorrect.
>
>Why?
Seems obvious enough to me !
>You seem to know about your "unknown" data. That means it isn't
>as unknown as the name suggests...
Just because you know that it's unknown doesn't make it any less
unknown in value - without wanting to sound like a politician talking
crap about unknown unknowns and known unknowns !
> > As a very simplified example, say I'm 10 days into a month (with 20 days
>> remaining) and the values so far look like this:
>>
>> 1,2,3,4,5,6,7,8,9,10
>>
>> The 90th percentile should be 9
>
>according to what/who ?
Common usage ?
Seems obvious enough to me that 9 is the value which 90% of the
values in the list are equal or less than. Isn't that what a
percentile is about. OK, it's a bit coarse with so few samples.
> > I have looked through the documentation and can't find any mechanism
>> which would allow me to restrict the PERCENT function to a specific date
>> range (to exclude values in the future), or exclude NaN values.
>
>Why graph values in the future, you know this won't include useful data.
But it may well produce useful graphs ! For some reason accountants
seem to like pigeonholing numbers into arbitrary calender units
unrelated to what's actually going on in a business. One example that
comes easily to mind is an accountant who wants the sales figure for
the current month graphing - not the last 30 days, but the current
calendar month. Unless you have found a working crystal ball, at any
point before the end of the month you will unknown values in the
graph.
If the above samples (ie 1 .. 10) were values for the 1st through
10th of the month, then the right place to draw the line would be at
9 - ignoring unknown values for 11 through 28,30,31. If you assumed
zero for future samples then the line would incorrectly end up at 7.
Similarly your average would end up at a little under 2 instead of
5.5 ! I don't think any accountant would accept 2.2 as an average of
sales so far this month from those numbers.
Changing things a little, suppose there are 10 units of sales on day
1, would you accept a figure of .33 units/day as the average sales so
far this month or would you expect 10 ?
As another example, my ISP will give me graphs of my bandwidth usage
over a billing period. On 10th of each month it starts out with a
nearly empty graph and it fills up until the 9th of the following
month. What's useful for me is not an average calculated as "total to
date/30" but total to "date/day so far".
OK, it would probably be equally (possibly more) useful to show "last
30 days", but current billing period is what we get.
>try changing unknown into some known value, like zero or a very large
>negative number
I fail to see how that will help - it will just further skew the data.
The answer would appear to be to do the calculations over the range
"1st of month" to "today" whilst plotting them on an X axis from "1st
of month" to "end of month" - is that easy to do ?
--
Unsubscribe mailto:rrd-users-request at list.ee.ethz.ch?subject=unsubscribe
Help mailto:rrd-users-request at list.ee.ethz.ch?subject=help
Archive http://lists.ee.ethz.ch/rrd-users
WebAdmin http://lists.ee.ethz.ch/lsg2.cgi
More information about the rrd-users
mailing list