[rrd-developers] storage back-end - more thoughts

Mon Oct 27 22:56:34 CET 2008

On Sat, Oct 25, 2008 at 12:53:11PM +0100, Daniel Pocock wrote:
> The approach that I am currently using works on the assumption that all
> RRDs have the same RRAs in the same order (a good assumption with the
> current version of Ganglia, if you ignore the summary info stuff)

I think this is a valid assumption.  Most installations with large IO
needs will have many RRD's with the same geometry.  Striping does make it
difficult to adjust the geometry later, though...

> In practice, RRDs consist of multiple RRAs, some of which have the same
> size and interval.

What do you mean by "interval"?  The # of PDP's in a CDP?  Or the PDP step?

> Therefore, rather than striping RRDs, it might be even more useful to 
> stripe the RRAs, maybe even on separate block devices.  This would 
> achieve two benefits:
> 
> - similar RRAs within an RRD would get striped with each other, just as
> for DSs

Not sure what you mean by this statement.

Do you mean that a.rrd, b.rrd, and c.rrd would turn into these files:

	file 0 = { a.rra[0], b.rra[0], c.rra[0] }
	file 1 = { a.rra[1], b.rra[1], c.rra[1] }
	... and so on?

> - if you have varied RRDs, where only some of the RRAs are similar, then 
> the similar RRAs could get striped together and achieve some benefits

The previous discussions around striping were to contain multiple RRD
databases in one large striped file..  To find an RRD we'd need a way to
map:

	RRD --> ( stripe file, stripe slot )

If we combined your two ideas, then it sounds like we'd need a way to map:

	(RRD, RRA) --> (stripe file, stripe slot)

How are you planning to manage all the meta-information?  What would we do
with an RRD database that's missing an RRA (say, for example, the stripe
file for that RRA was lost+found).

> What this boils down to is that the there would potentially be an API 
> where rrdtool can ask for block storage for each RRA, and the 
> implementation would decide where to put the RRA.  This wouldn't be 100% 
> compatible with the model currently used in rrd_open.c, although it 
> would still be possible to create the traditional RRD files.

One good thing about this approach is that you could create tiered storage
based on the rra.pdp_per_row..  The RRAs that get updated the most could
be on faster media than the long-term storage RRA's that are updated less
often.

However, both the striping and RRA-separation ideas create their own
meta-problems that are currently solved by the file-system...

-- 
 kevin brintnall =~ /kbrint at rufus.net/