[rrd-users] Getting Total Bytes from COUNTER Data Source

Ryan Kubica kubicaryan at yahoo.com
Thu Dec 29 06:00:16 CET 2011

A RRA (aside from the initial step RRA) is a consolidation function.  You've created your RRD with some consolidation period (24 hour) and an xff which is a factor of pdp's (step datapoints) that will be used to 'create' the average (or total) in the RRA.

So ... if you have set xff to .5 then half of your datapoints used to create the RRA can be missing and rrdtool will 'pretend' the rest would have been there -- based on your xff value.

for a 5 minute RRA and total based on 1 minute data and an xff of .5:

1 min values:3 3 4 5 6 = 4.2 = total 1,260
1 min and null:3 3 null 5 6 = 4.25 ((3+3+5+6)/4) = total 1,275
1 min and 2 nulls:3 3 null null 6 = 4 ((3+3+6)/3) = total 1,200

now the fun part:

1 min and null
from a low datapoint:null null 4 5 6 = 5 =total 1,500

Based on the above example, you can see that any data missing in your 'troff' (low traffic part of the day) will skew your daily Total high - but ONLY for your 24 RRA ... because you're 24 hour RRA is doing this xff and then the Total function is operating off of the value and multiplying back out throughout the day.  It's doing exactly what you told it to do.

The real solution is to not use a 24 hour RRA, ignore the first 24hour set of data, or never miss data. :-)  I'd side on the don't use 24 hour RRA's; they are a waste in many ways and obviously have inherent calculation error in them when they don't have all data available.


 From: Chris Mason <chris at noodles.org.uk>
To: Simon Hobson <linux at thehobsons.co.uk>; rrd-users at lists.oetiker.ch 
Sent: Wednesday, December 28, 2011 10:28 AM
Subject: Re: [rrd-users] Getting Total Bytes from COUNTER Data Source
Actually, for data missing in the middle of a time period I would
prefer it to use averages - this makes sense.
The big problem I have is with missing data at the beginning of the
first step - a 24hr step RRA is always going to over estimate if you
start filling it sometime after midnight whereas the 5min step RRA

if I create an RRD at 23:00 then I would always have 23 hours of over
estimated data.


On 28 December 2011 18:17, Chris Mason <chris at noodles.org.uk> wrote:
>>>I am assuming the data that is missing at the beginning of the RRA
>>>would be considered UNKNOWN and I would expect the TOTAL function to
>>>ignore it?
>> You may think that ...
>> It depends on how it's calculated - the obvious calculation is average * time.
>> The average function ignores unknown periods, so the average of
>> unkn,1,2,3 would be 2 (6/3), not 1.5 (6/4). If you then multiply that
>> by the period (4 samples in this case), it would give you 8 instead
>> of 6.
> It comes down to whether you want unknowns to be 0 or the average.
> As you say, it comes down to what the TOTAL function does:
> If I had 'U,1,2,3' then the average would be 2 but I would hope the
> TOTAL function would use 3*2 to find the TOTAL.
> But other people might want it to estimate the missing values using
> the average - my interpretation is that if a value isn't there, then
> you can't count it.
>> It's an interesting debate as to which is more accurate !
>> NB - I don't know the actual calculation that's used. I'll leave
>> someone who knows the code to comment on that.

rrd-users mailing list
rrd-users at lists.oetiker.ch
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.oetiker.ch/pipermail/rrd-users/attachments/20111228/aa379190/attachment.htm 

More information about the rrd-users mailing list