<html><body><div style="color:#000; background-color:#fff; font-family:arial, helvetica, sans-serif;font-size:10pt"><div><span><br></span></div><div><span><div><span>A RRA (aside from the initial step RRA) is a consolidation function. You've created your RRD with some consolidation period (24 hour) and an xff which is a factor of pdp's (step datapoints) that will be used to 'create' the average (or total) in the RRA.</span></div><div><span><br></span></div><div><span>So ... if you have set xff to .5 then half of your datapoints used to create the RRA can be missing and rrdtool will 'pretend' the rest would have been there -- based on your xff value.</span></div><div><span><br></span></div><div><span>for a 5 minute RRA and total based on 1 minute data and an xff of .5:</span></div><div><span><br></span></div><div><span>1 min values:<span class="Apple-tab-span" style="white-space: pre; ">                </span>3 3 4 5 6 = 4.2 = <span style="font-weight:
bold;">total 1,260</span></span></div><div><span>1 min and null:<span class="Apple-tab-span" style="white-space: pre; ">                </span>3 3 null 5 6 = 4.25 ((3+3+5+6)/4) = total 1,275</span></div><div><span>1 min and 2 nulls:<span class="Apple-tab-span" style="white-space: pre; ">        </span>3 3 null null 6 = 4 ((3+3+6)/3) = total 1,200</span></div><div><span><br></span></div><div><span>now the fun part:</span></div><div><span><br></span></div><div><span>1 min and null</span></div><div><span>from a low datapoint:<span class="Apple-tab-span" style="white-space: pre; ">        </span>null null 4 5 6 = 5 =<span style="font-weight: bold;"> total 1,500</span></span></div><div><span><br></span></div><div><span><br></span></div><div>Based on the above example, you can see that any data missing in your 'troff' (low traffic part of the day) will skew your daily Total high - but ONLY for your 24 RRA ... because you're 24 hour RRA is doing this xff and then the Total function is
operating off of the value and multiplying back out throughout the day. It's doing exactly what you told it to do.</div><div><br></div><div>The real solution is to not use a 24 hour RRA, ignore the first 24hour set of data, or never miss data. :-) I'd side on the don't use 24 hour RRA's; they are a waste in many ways and obviously have inherent calculation error in them when they don't have all data available.</div><div><br></div><div>-Ryan</div><div><br></div></span></div><div><br></div> <div style="font-size: 10pt; font-family: arial, helvetica, sans-serif; "> <div style="font-size: 12pt; font-family: 'times new roman', 'new york', times, serif; "> <font size="2" face="Arial"> <hr size="1"> <b><span style="font-weight:bold;">From:</span></b> Chris Mason <chris@noodles.org.uk><br> <b><span style="font-weight: bold;">To:</span></b> Simon Hobson <linux@thehobsons.co.uk>; rrd-users@lists.oetiker.ch <br> <b><span
style="font-weight: bold;">Sent:</span></b> Wednesday, December 28, 2011 10:28 AM<br> <b><span style="font-weight: bold;">Subject:</span></b> Re: [rrd-users] Getting Total Bytes from COUNTER Data Source<br> </font> <br>
Actually, for data missing in the middle of a time period I would<br>prefer it to use averages - this makes sense.<br>The big problem I have is with missing data at the beginning of the<br>first step - a 24hr step RRA is always going to over estimate if you<br>start filling it sometime after midnight whereas the 5min step RRA<br>won't.<br><br>e.g.<br> if I create an RRD at 23:00 then I would always have 23 hours of over<br>estimated data.<br><br>/Chris<br><br>On 28 December 2011 18:17, Chris Mason <<a ymailto="mailto:chris@noodles.org.uk" href="mailto:chris@noodles.org.uk">chris@noodles.org.uk</a>> wrote:<br>>>>I am assuming the data that is missing at the beginning of the RRA<br>>>>would be considered UNKNOWN and I would expect the TOTAL function to<br>>>>ignore it?<br>>><br>>> You may think that ...<br>>> It depends on how it's calculated - the obvious calculation is average * time.<br>>> The
average function ignores unknown periods, so the average of<br>>> unkn,1,2,3 would be 2 (6/3), not 1.5 (6/4). If you then multiply that<br>>> by the period (4 samples in this case), it would give you 8 instead<br>>> of 6.<br>><br>> It comes down to whether you want unknowns to be 0 or the average.<br>> As you say, it comes down to what the TOTAL function does:<br>><br>> If I had 'U,1,2,3' then the average would be 2 but I would hope the<br>> TOTAL function would use 3*2 to find the TOTAL.<br>> But other people might want it to estimate the missing values using<br>> the average - my interpretation is that if a value isn't there, then<br>> you can't count it.<br>><br>>> It's an interesting debate as to which is more accurate !<br>>><br>>> NB - I don't know the actual calculation that's used. I'll leave<br>>> someone who knows the code to comment on
that.<br><br>_______________________________________________<br>rrd-users mailing list<br><a ymailto="mailto:rrd-users@lists.oetiker.ch" href="mailto:rrd-users@lists.oetiker.ch">rrd-users@lists.oetiker.ch</a><br><a href="https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users" target="_blank">https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users</a><br><br><br> </div> </div> </div></body></html>