[rrd-users] RRA/RRD tuning

Mon Aug 13 02:08:00 CEST 2012

Hey Tobi.

Well actually this would belong on the rrd-devel list but as we've
already started here...

On Sun, 2012-08-12 at 16:44 +0200, Tobias Oetiker wrote:
> > Ok... if all that is correct... I think it would make perhaps sense to
> > add this somewhere to the documentation, doesn't it?
> > At least I couldn't directly read out that information from what I've
> > found :)
> patches are always welcome ... especially if they help other people
> understand how rrdtool works

Attached is a file with some elaborate discussion on:
- Overview on RRAs
- Tuning of RRDs via its RRAs

It misses:
- We might add which CF is faster...
I mean MAX/MIN are likely just comparisons and are therefore a tiny bit
faster (CPU wise) as e.g. AVERAGE (which I guess includes some adding
and dividing?).

- any further information, especially tuning ideas via RRAs their
settings or any other settings within the RRDs themselves

- your consent that it's correct ;-)

Are you going to add/format it for the manpages? If I should do it,
please tell me which (or whether it should become a new one, and which
name then).

Cheers,
Chris.
-------------- next part --------------
Overview on RRAs:
A RRD database consists of one or more RRAs, each of which spanning over a certain time span with a certain granularity.

1) Time Spans
The time span of each RRA ends (that is the most recent point in time) at the same point in time.
But each RRA may reach back a different amount of time. In other words: The time span of each RRA may begin at a different point in time.
This can be visualised like this:
⇠──────────────── farther back in time
              ╭── time of the last update (“now”)
  ┌───────────┐
  │           │ RRA #1
  └───────────┘
        ┌─────┐
        │     │ RRA #2
        └─────┘
           ┌──┐
           │  │ RRA #3
           └──┘

The (current) end of the time span of all RRAs in a given RRD is set at each update (see rrdupdate), to the time of the most recent data point.
The (current) begin of the time span of each RRA in a given RRD follows from its length, which is defined when creating the RRD (see rrdcreate) via the steps and rows arguments of the RRA: options as well as the value of the --step option of the RRD.

2) Granularities
When comparing RRAs of a given RRD, they may effectively have a different granularity, that is the number of data points in a fixed time span.
This can be visualised like this:
⇠──────────────── farther back in time
              ╭── time of the last update (“now”)
  ┌───────────┐
  │· · · · · ·│ RRA #1
  └───────────┘
        ┌─────┐
        │·····│ RRA #2
        └─────┘
           ┌──┐
           │┈┈│ RRA #3
           └──┘
In the above example, the granularity decreases with the length of the time span.

The granularity of an RRA in a given RRD follows from the length of its time span and the number of data points in it, the later which is defined when creating the RRD (see rrdcreate) via the steps argument of the RRA: option.

Tuning of RRDs via its RRAs:
RRDs can be tuned via the number of its RRAs and their settings.
Tuning is mainly done from the following point of views:
- size of the RRD
- time needed for updates (see rrdupdate)
- time needed to retrieve and process data, mainly with respect to graphing (see rrdgraph) but also fetching (see rrdfetch) of data

There is no ideal solution to meet all of the above goals. How to tune (especially, in which direction) always depends on the respective usage scenarios.

The following presents general concepts on tuning:
1) Tuning of the size of a RRD
Tuning (that means reducing) the required size of an RRD is in principle easy:
- the less RRAs, the less space is required
- the less data points per RRA, the less space is required
  This of course affects (or is affected by) the length of a RRAs time span as well as its granularity.

One important way is also to reduce the size via shrinking granularity:
In many scenarios older data needs to be less detailed as recent or current data.
This is the typical use case for having multiple RRAs; such which high granularity but short time spans and such with low granularity but long time spans.

2) Tuning of the time needed for updates
Generally, the less RRAs, the less time is required for updates (as fewer RRAs must be updated).

There are of course many further methods to tune update times (that go beyond the tuning of RRDs via its RRAs), ranging over the wide field of (storage) IO tuning.

3) Tuning of the time needed to retrieve and process data, mainly with respect to graphing
When having multiple RRAs in a given RRD, rrdgraph is smart enough to retrieve data from that RRA that fits best, which can be used to tune the graphing.

Consider the following example:
⇠──────────────── farther back in time
        ╭──────── the smallest common time span (e.g. “one year ago”)
        │     ╭── time of the last update (“now”)
  ┌───────────┐
  │┈┈┈┈┈┈┈┈┈┈┈│ RRA #1
  └───────────┘
        ┌─────┐
        │·····│ RRA #2
        └─────┘
There are two RRAs, with RRA #1 having more granularity than RRA #2 and even going farther back in time than the later.
If a graph within the time span of RRA #2 (“one year back”) is to be generated rrdgraph might choose between both RRAs.
When the resolution of the graph is “low enough” that the data points in RRA #2 suffice to fill it (without interpolating any data), then rrdgraph will select this for the graphing.
As less data has to be read (and perhaps interpolated/scaled to an even smaller range) this will be faster than when RRA #1 would have been used.

Of course, if the desired graph lays outside of RRA #2’s time span or if the resolution of the graph is to be “high enough” that the data points of RRA #2 wouldn't suffice, RRA #1 will be used.

Summarising up, by adding (possibly even more layers) of less detailed RRAs of the same or a smaller time span (as RRA #2 was in the example above), one collects data which will be less detailed and even less exact, but which allows for faster graphing.

Of course (as laid out above in (2)) this will go at the cost of update speed.

There are of course many further methods to tune retrieval (reading the data) times (that go beyond the tuning of RRDs via its RRAs), ranging over the wide field of (storage) IO tuning.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5450 bytes
Desc: not available
Url : http://lists.oetiker.ch/pipermail/rrd-users/attachments/20120813/c02ac2be/attachment-0001.bin