[rrd-developers] interleaving RRD files, RRDfs perhaps?

Fri Oct 3 10:57:34 CEST 2008

----- Original Message ----- 
From: <Daniel.Pocock at barclayscapital.com>
To: <tobi at oetiker.ch>
Cc: <rrd-developers at lists.oetiker.ch>
Sent: Friday, October 03, 2008 10:24 AM
Subject: Re: [rrd-developers] interleaving RRD files, RRDfs perhaps?

>
> > Normally the filesystem should be able to take care of quite
>> a lot of otimization duties as long as you present it with a
>> siutable workload. So maybe there is some low hanging fruit
>> to be harvested by looking at when and in what order
>> rrdcached does its updates and also maybe how they are
>> performed by the rrd update code, without gowing to a raw device ...
>>
>
> I'm certainly studying that as well - the interleaving would work at a
> lower level than rrdcached, and would allow the block-level caching
> systems to have a higher chance of success.
>
>> I am pretty sure though that you can esily devise a much more
>> performant system if you start limiting the way data can be
>> deliverd (eg all rrds look the same)
>
> That is certainly the case with some applications, e.g. Ganglia
>>
>> Looking at the existing code, did you investigate creating
>> rrd files with a LOT of DSes ?
>
> Yes, that seems like a possibility - but many applications, such as
> Ganglia, create new RRDs on the fly.  Therefore, there would need to be
> some way of creating an optimal number of empty DSes in advance, and
> then mapping them to application-specific names when they are needed.
> This logic could be implemented in the app or rrdtool.
>
> Implementing within rrdtool in a way that preserves the existing rrdtool
> API seems to be desirable - perhaps the different operating mode could
> be enabled by some kind of environment variable and/or an rrdtool
> configuration file in /etc?

Combining a lot of data into one large RRD has the disadvantage that all 
DSes need to be written to in one go and with the same timestamp.

How about separating the actual data from the houskeeping?

Just one copy of stat_head, ds_def, rra_def in a separate file. These are 
removed from each RRD file and replaced by a pointer to live_head, pdp_prep 
and cdp_prep in that separate file.

For this to be a possible solution, several reads and writes need to be 
combined into one. This probably means the front-end needs to know which 
RRDs to update together, or the user needs to accept some caching and thus 
some slight risc of data loss.  And either way this only makes sense when 
RRDtool is run as daemon.

Changes to the API would not be necessary. In fact, changes would be 
minimal. The way pointers are computed will change, and the file they're 
pointing into, but the actual process stays the same.  New would be a way of 
determining which location in the housekeeping file is available, or expand 
it if necessary (probably in chunks of 128, or 1024).