[rrd-users] Percentile off-by-one

Leander Koornneef l.koornneef at ic-s.nl
Thu Aug 20 12:37:25 CEST 2009


Hi all,

On Aug 19, 9:48 am, t... at oetiker.ch (Tobias Oetiker) wrote:
 > Hoi Joshua,
 >
 > Thursday Joshua Keroes wrote:
 > > How is percentile calculated? We've been calculating this  
manually, since
 > > before rrdtool offered a PERCENT function. We discovered that our  
manual
 > > calculation and the one supplied by rrdtool differ by one index.  
For example
 > > (percentile.pl program attached):
 >
 > > [Thu Aug 13, 19:58:41 | 692] $ percentile.pl -v --cf=AVERAGE -- 
ds=ingress
 > > --dsi=0 --start=1242864000 --end=1245542399  
etkfgl624456ubielg01g.rrd
 >
 > > percentile.pl using RRDs-1.2019
 > > etkfgl624456ubielg01g.rrd:
 > >         main::ptile_fetch()
 > >            start = 05/21/09
 > >            end = 06/20/09
 > >            rows = 31
 > >            95% of 31 = 29.45
 > >            95th %-ile row index = 29
 > >            discarded rows of -NaN's = 0
 > >         row 28 = *1126767.40*
 > >         row 29 = *1521699.32*
 > >         row 30 = 4132277.36
 > >    manual  calculation using fetch() = *1521699.32*
 > >    PERCENT calculation using graph() = *1126767.40*
 > >    difference ~ 394931 (26%)
 >
 > > Our manual calculation is grabbing row 29 and PERCENT is grabbing  
row 28.
 >
 > hmmm the heart of the problem is, that you have too small a sample
 > set ... if the difference between picken one row or the next is 26%
 > you should NOT be using percentile since the results will be more
 > or less random ...

I'm also experiencing discrepancies in PERCENT calculations as done by  
rrdtool
when compared to other tools, like the Statistics::Descriptive perl  
module. I've
also tested with Excel and Numbers (from Apple iWork). The difference  
between
outcomes appears to increase with larger datasets. Here are the 90th  
percentile
results from rrdtool and my own script (which uses  
Statistics::Descriptive):

# of days		rrdtool		Statistics::Descriptive
1			2.25066		2.26135
10			1.37383		1.49129
20			1.30663		1.47212
30			1.48260		1.67608

This last dataset (30 days) consists of about 9000 values and in this  
dataset,
the value 1.48260 is actually more like the 87th percentile. Both  
Excel and Numbers
agree with Statistics::Descriptive here, so I'm inclined to suspect an  
error/bug in
rrdtool?

Thanks,
Leander



More information about the rrd-users mailing list