[rrd-users] NaN values in percentile calculations

Thu May 25 14:15:18 CEST 2017

Hi guys

I'm having this issue with some of our data for a customer’s billing. Wondering if there is an elegant solution? What I really want is the ability to exclude NaN values from percentile calculations. Consider the following data series:

foo = 1,NaN,2,NaN,3,NaN,4,NaN,5,NaN,6,NaN,7,NaN,8,NaN,9,NaN,10,NaN

VDEF:90perc=foo,90,PERCENT

Seems like rrdtool includes the NaN values in the calculation. So I get 90perc=8. While technically correct according to the series, its not really useful in determining real 90th percentile values in an every-day use-case. Particularly with billing, just because you have no data it’s not reasonable to assume it’s 0. Most likely the traffic profile by “connecting-the-dots” would have actually looked like this:

1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10

In this case I get 90perc=9 which means the billing would be skewed because of the loss of data. Now, it’s debatable what is mathematically really the right thing to do here. But the bottom line is that that we don’t know if the NaN values would have been above or below the 90th percentile value, it’s better to exclude them rather than assume they are below, IMHO.

The customer would also not be too happy as NaN values being included always pushes the percentile value down by definition and this mean they might end up with slightly “incorrect” billing.

So anyone know of a way to exclude those NaN values from the PERCENT calculation?

Thanks,
  Jacques
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.oetiker.ch/pipermail/rrd-users/attachments/20170525/1f200889/attachment.html>