[rrd-users] Aggregate/combine several RRDs

Fri Jun 20 10:59:58 CEST 2008

On Fri, Jun 20, 2008 at 09:41:18AM +0200, Raimund Berger wrote:

> The question is still of technical interest though, whether to create
> intermediate RRDs for this kind of usage. This basically boils down to
> the computing versus storage cost question I guess, plus the issue of
> creating an additional point of error/failure.

Creating another RRDfile isn't hard to do.  Filling it with numbers
isn't hard either.  The biggest challenge is to get your minimum and
maximum rates correct.

When reading a previous RRD (e.g. with fetch) you will get rates.
This probably means your new RRD should have data source type GAUGE.

If you need to be able to set minimum, maximum and average, you will
need to compute a combination of three updates per interval which
get it exactly right.

Thinking out loud, hoping that someone else can point this group
to a better algorithm or knows shortcuts:

Goal: set Rmin, Ravg, Rmax (minimum,average,maximum rate) for one
RRDtool interval (e.g. in the RRA which has 86400 seconds per CDP,
two years back).  Obviously this means heartbeat should allow for
such a large interval.

Take Ravg as the basis. Needed: find two intervals Tmin and Tmax
so that Rmin*Tmin + Rmax+Tmax equals Ravg*(Tmin+Tmax). In addition
to this, Tmin+Tmax cannot be larger than Tavg (the entire interval
duration).

Start by setting both Tmin and Tmax to half Tavg.

If (Rmin+Rmax)<Ravg {
    decrease Tmin, but no shorter than 1 second.
    If (Rmin*Tmin+Rmax*Tmax) is still < Ravg*Tavg, increase Tmax. It
    should never happen that Tmax becomes >= Tavg. (error condition)
} elseif (Rmin+Rmax)>Ravg {
    decrease Tmax, but no shorter than 1 second.
    If (Rmin*Tmin+Rmax*Tmax) is still > Ravg*Tavg, increase Tmin. It
    should never happen that Tmin becomes >= Tavg. (error condition)
}
The new Tavg becomes Tavg-(Tmin+Tmax).  If it is <0 then there has
to be something wrong in the source RRD.  Abort.

Now perform two or three updates:
* update with Rmin for Tmin
* update with Rmax for Tmax
* if Tavg is not zero, update with Ravg for Tavg.

To visualize this algorithm:
Think of a rectangle "A", Ravg high and Tavg wide.  Overlayed are two
other rectangles "B" and "C": Rmin high half Tavg wide, and Rmax high
half Ravg wide.  In the process a rectangle "D" can be created, Ravg
high and just wide enough to that the combined width of B+C+D equals
the width of A.

The algorithm looks at the average height of the two smaller rectangles.
If it happens to be just fine, the if-then-else does nothing.

If the average is too low, rectangle B needs to become smaller.
If it cannot be made small enough, rectangle C needs to be taller.

Similarly: if the average is too high, rectangle C needs to be made
smaller and, if still not good, B needs to be taller.

If the combined width of B and C is larger than that of A, then I
believe something has to be wrong in the original data.  If it is
smaller, then just fill the remaining time with rectangle D which,
due to its height, has no impact on the average.

What do you think?  Am I on the right track here?  Has someone
already implemented this or a similar technique?

And should we discuss this here or on rrd-developers?

-- 
Alex van den Bogaerdt
http://www.vandenbogaerdt.nl/rrdtool/