[rrd-users] [unsure] max DS per rrd file

Ryan Kubica kubicaryan at yahoo.com
Thu Apr 25 03:18:23 CEST 2013

Hi Mikel,

I've personally never found a good reason to store more than one datasource per RRD datafile; and run -very large- rrdtool data servers ( multi-millions per server - many servers .)

There are far too many edge-cases, latency issues and join overhead in trying to consolidate datasources into a single datafile.  Yes, rrdtool itself is more efficient with an insert like that but: 1) what if the datapoints are collected at different times?  2) what if they are different steps?  3) what if you want to add a datasource? 4) what if you simply have too many datasources to try and order/consolidate from a queue to the datafile?  There is also non-trivial complexity, overhead and index'ing into an rrd datafile for specific datasources. 

Linux is extremely efficient at block updates, caching, open/closes, etc ... rrdtool on a low-end ( 4 cpu ) server with limited memory can easily store 160 thousand datasources per minute - on a better server, a whole lot more than that.

'Distributed Cluster' isn't a good reason to not send all your time-series data to one server or small set of servers.  The latency/request-time incurred in having to fetch data from those servers is usually not worth the trade off.

Graphs of many hundreds of datasources computed for multi-day/week time-ranges in the result set are generated in 10s of milliseconds; not seconds ... rrdtool is quite capable of producing on-demand graphs of hundreds of graphs per second from one server.

I suggest you write a little test-script to write out rrd data to individual rrd datafiles to see 'how quick' your servers are at it.  There is some OS tuning and rrdtool RRA sizing that will help; especially don't keep hour or daily rollups ... the server has to hold onto those blocks to make the consolidation quick and not incur a read from disk.

rrdtool scales rather simply ( and without rrdcached -- as I don't use that either. )


 From: mikel <infoeuskadi at gmail.com>
To: rrd-users at lists.oetiker.ch 
Sent: Saturday, April 20, 2013 4:48 AM
Subject: Re: [rrd-users] [unsure]  max DS per rrd file

Thanks for your fast reply again.

>Maybe I don't understand what you say here. Some metrics, or all metrics
>queried? Both statements cannot be true at the same time?

Yes it is a tricky case. Apologies I was not clear enough.

In most cases all metrics are queried at the same time, because we want to
know what value they had at a given time. And classify them.

Very randomly we would query for just one metric.

>Anyway, if you query only once in a while, maybe you should think about 
>reducing the number of RRAs in each RRD, and just let it consolidate at 
>graph time. Yes, this will mean you will have to wait longer for your graph 
>to be made, but you save processing time at every update.

This is interesting I did not think about that. Thanks for the hint.

Thanks for your help again.

View this message in context: http://rrd-mailinglists.937164.n2.nabble.com/max-DS-per-rrd-file-tp7580966p7580971.html
Sent from the RRDtool Users Mailinglist mailing list archive at Nabble.com.

rrd-users mailing list
rrd-users at lists.oetiker.ch
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.oetiker.ch/pipermail/rrd-users/attachments/20130424/8cdb2014/attachment-0001.htm 

More information about the rrd-users mailing list