<br><font size=2 face="sans-serif">See this link for a close look into
the problem:</font>
<br>
<br><a href="https://lists.oetiker.ch/pipermail/rrd-users/2008-January/013582.html"><font size=2 face="sans-serif">https://lists.oetiker.ch/pipermail/rrd-users/2008-January/013582.html</font></a>
<br><font size=2 face="sans-serif"><br>
Joe Loiacono<br>
</font>
<br>
<br>
<br>
<table width=100%>
<tr valign=top>
<td><font size=1 color=#5f5f5f face="sans-serif">From:</font>
<td><font size=1 face="sans-serif">"Alex van den Bogaerdt" <alex@vandenbogaerdt.nl></font>
<tr valign=top>
<td><font size=1 color=#5f5f5f face="sans-serif">To:</font>
<td><font size=1 face="sans-serif"><rrd-users@lists.oetiker.ch></font>
<tr valign=top>
<td><font size=1 color=#5f5f5f face="sans-serif">Date:</font>
<td><font size=1 face="sans-serif">05/12/2010 03:09 AM</font>
<tr valign=top>
<td><font size=1 color=#5f5f5f face="sans-serif">Subject:</font>
<td><font size=1 face="sans-serif">Re: [rrd-users] RRD PERCENT question
(95 percentile)</font></table>
<br>
<hr noshade>
<br>
<br>
<br><tt><font size=2>> Hello List,<br>
><br>
> I'm a happy RRD user, but there's something I need help with.<br>
> We're currently moving to RRDtool for accounting & billing.<br>
> For this I've created RRD files that keep 105120 5minute samples (1
year).<br>
><br>
> Now I'm comparing the 95% numbers generated by RRDtool with the 95%<br>
> number generated by our old script. The problem is that these numbers<br>
> are significantly different.<br>
<br>
Note that "the" 95th percentile does not exist. There are many
different <br>
methods of computing this value, and although they are similar and will
more <br>
or less provide the same result, they do differ.<br>
<br>
I don't recall exactly how I started, but I think originally I used <br>
data[n*steps/100]. Then, after some discussion on the mailing list, round()
<br>
was introduced.<br>
<br>
> I hope someone can help me understand why these numbers are so different.<br>
<br>
Because if the array index changes by only one, the returned value may
be <br>
quite different.<br>
<br>
> This is how I determine the 95 percentile number using RRDtool<br>
><br>
[snipped some]<br>
> VDEF:95thin=inbits,95,PERCENT \<br>
> VDEF:95thout=outbits,95,PERCENT \<br>
<br>
Looking fine.<br>
<br>
> My manual test was done like this:<br>
> 1) fetch rawdata:<br>
> /usr/local/rrdtool-1.2.19/bin/rrdtool fetch \<br>
> --start '1271894400' --end '1272240000' \<br>
> "deviceid11_XXX_Transit.rrd" AVERAGE > OUT_RAW;<br>
><br>
> 2) read this data with a perl script than sort values and show 95%
number.<br>
><br>
> In this case the data set contains 1153 samples (no NaN in sample).<br>
> so after sorting the 95% percentile should be the value (times 8 for<br>
> bits) on position 1096.<br>
<br>
> The problem is that this number is quite different (lower) from what
is<br>
> returned using PERCENT above.<br>
<br>
Please see if the number on position 1095 or 1097 equals that of what <br>
rrdtool finds.<br>
<br>
> Note that this sample does not contain any NaN values.<br>
> I also tried this with the latest version of RRDtool, same result.<br>
><br>
> Can anyone explain why this is different? Is this expected?<br>
> How exactly does RRD this internally?<br>
<br>
Create an array, fill it with the data, use qsort and then find the
correct <br>
spot:<br>
<br>
qsort(array, step, sizeof(double), vdef_percent_compar);<br>
field = round((dst->vf.param * (double)(steps - 1)) / 100.0);<br>
dst->vf.val = array[field];<br>
<br>
In here, vdef_percent_compar is a function that sorts NAN < -INF <
numbers < <br>
+INF<br>
<br>
Your calculation: 1153 samples, 95% = 1095,35 so you take 1096.<br>
RRDtool: round(95*1152/100)=1094, based on an array with first member is
0, <br>
so 1094 is the 1095th position.<br>
If I recall correctly, the original version did use truncation instead
of <br>
rounding, which makes no difference in this case.<br>
<br>
Anyway, unless I made a mistake here, rrdtool takes data[1094] and you
take <br>
data[1096]. Your returned value should be higher than what RRDtool
reports.<br>
<br>
> I would like to use RRDtool for this, but need to be sure that the<br>
> numbers are correct, i.e understand why the numbers are different
than<br>
> when calculated manually.<br>
<br>
I would also worry why it is opposite to what I reasoned above.<br>
<br>
_______________________________________________<br>
rrd-users mailing list<br>
rrd-users@lists.oetiker.ch<br>
</font></tt><a href="https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users"><tt><font size=2>https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users</font></tt></a><tt><font size=2><br>
</font></tt>
<br>
<br>