[rrd-users] RRDTool Aggregation Inaccuracies.

NHarris nicholas.kyle at gmail.com
Thu Jul 9 08:25:09 CEST 2009


Thanks Steve. Sadly, that's what I was afraid of.

Looks like I made a couple of inaccurate assumptions. To start, I made new
RRDs that didn't consolidate the data down (much), assuming I could just
hold 5 minute samples over teh course of the year:

'/usr/bin/rrdtool create DATABASE_NAME
DS:audience:GAUGE:600:0:URRA:MAX:0.5:1:105120'

Second, I assumed that the aggregation (via CDEF) was done and THEN a
maxima was pulled from the new aggregation


I'm considering two options to bypass this problem, since the graphs
themselves appear fine. I can either

a) use PHP to fetch all data for a time period from each RRD in parallel,
sum up each 'row' and track the maximum value. Then, spit out this data
below the graphs, in good old fasioned HTML. Or...

That's going to be a bit more intensive on the script. And an annoying chunk
of work :P

b) start up an n+1 RRD that will act as an aggregation RRD. Since all
sources are polled at the same time (or very close to it, in a sequence), I
can sum them up and in the final step, add that to the aggregate RRD.  That
way I have all the data, but it can still be manipulated and graphed
(properly).

This has the drawback of negating any past data, since I need to start from
scratch. Do you know of a way to merge RRDs that would fit this problem?


What are your thoughts on either of those approaches?  At any rate, thank
you muchly for the reply.

-Nick


On Wed, Jul 8, 2009 at 10:43 PM, S Shipway (via Nabble) <
ml-user+68162-155771166 at n2.nabble.com<ml-user%2B68162-155771166 at n2.nabble.com>
> wrote:

> I suspect this may be the visible result of the statistical fact that
> although
>
> Avg(a) + Avg(b) == Avg(a+b)
>
> you should note that
>
> Max(a) + Max(b) != Max(a+b)
>
> where a and b are elements iterated over the same time series.  I get this
> problem when graphing CPU usage split into user/system/wait and then trying
> to take a maxima by summing the maxima of the separate datasources,
> whereapon I find a value greater than 100% as soon as I start to use an RRA
> with >1 dp per cdp.
>
> Maybe what I just said was all greek to you :)
>
> As RRDTool rolls up the data to form the weekly etc RRAs, it will summarise
> the data points.  At this point, you can no longer simply sum the maxima
> because of the second equation above.  When you look at the Daily graph, it
> works, since the CDPs (consolidated data points in the RRA) are formed from
> just a single DP (data point) and so Max(a)==Avg(a) , which means the sum
> works.
>
> In summary, its inaccurate because you can't do this sort of calculation
> with consolodated data.  Hope this made sense, I'm full of 'flu germs today
> and not too clear
>
> Steve
> ________________________________________
>
> _______________________________________________
> rrd-users mailing list
> rrd-users at ...<http://n2.nabble.com/user/SendEmail.jtp?type=node&node=3229713&i=0>
> https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users
>
>
> ------------------------------
>  View message @
> http://n2.nabble.com/RRDTool-Aggregation-Inaccuracies.-tp3217529p3229713.html
> To unsubscribe from Re: RRDTool Aggregation Inaccuracies., click here< (link removed) ==>.
>
>
>

-- 
View this message in context: http://n2.nabble.com/RRDTool-Aggregation-Inaccuracies.-tp3217529p3229811.html
Sent from the RRDtool Users Mailinglist mailing list archive at Nabble.com.



More information about the rrd-users mailing list