[rrd-users] Re: percentile calculations

Dan Cech dcech at phpwerx.net
Tue Sep 5 21:05:24 MEST 2006


Thanks for the feedback, I've clarified each point below.

Alex van den Bogaerdt wrote:
> On Tue, Sep 05, 2006 at 02:01:03PM -0400, Dan Cech wrote:
>> However, when I'm displaying the graph for the current month, the
>> PERCENT function is using all the unknown future values in the
>> calculation
> sure
>>              causing it to be incorrect.
> Why?

In this case I don't mean incorrect from a mathematical standpoint, but
in terms of the purpose of the report.

> You seem to know about your "unknown" data.  That means it isn't
> as unknown as the name suggests...

Yes, I can easily disregard it for reporting purposes, that's not the

My problem is that my employer requires display of a graph showing the
data for the current calendar month, with the 95th percentile line overlaid.

I'm attempting to figure out a solution to produce this without having
to use a separate call to calculate the 95th and manually specify it for
the graph, though it appears that this may be my only workable solution.

>> As a very simplified example, say I'm 10 days into a month (with 20 days
>> remaining) and the values so far look like this:
>> 1,2,3,4,5,6,7,8,9,10
>> The 90th percentile should be 9
> according to what/who ?

The 90th percentile of the values above is 9.

I understand that the PERCENT function will include the 20 UNKN values
and produce the answer 7 which is also mathematically correct but
'incorrect' from the point of view of this particular application.

>> I have looked through the documentation and can't find any mechanism
>> which would allow me to restrict the PERCENT function to a specific date
>> range (to exclude values in the future), or exclude NaN values.
> Why graph values in the future, you know this won't include useful data.

See above, my employer requires that the graph show the calendar month.

> try changing unknown into some known value, like zero or a very large
> negative number

This will have the same effect as UNKN, if I'm reading the relevant
documentation correctly:

Unknown values are considered lower than any finite number for this
purpose so if this operator returns an unknown you have quite a lot of
them in your data. Infinite numbers are lesser, or more, than the finite
numbers and are always more than the Unknown numbers. (NaN < -INF <
finite values < INF)

It seems that the easiest method may be to pre-calculate the 95th
percentile on just the known data and go from there, though I would like
to avoid the added overhead of opening each RRD twice (these graphs can
span up to 20 rrds) if possible.



Unsubscribe mailto:rrd-users-request at list.ee.ethz.ch?subject=unsubscribe
Help        mailto:rrd-users-request at list.ee.ethz.ch?subject=help
Archive     http://lists.ee.ethz.ch/rrd-users
WebAdmin    http://lists.ee.ethz.ch/lsg2.cgi

More information about the rrd-users mailing list