[rrd-developers] Re: How to get the most performance when using lots of RRD files
henrik at hswn.dk
Wed Aug 16 13:38:11 MEST 2006
On Wed, Aug 16, 2006 at 07:19:01AM -0400, Richard A Steenbergen wrote:
> On Wed, Aug 16, 2006 at 08:10:09AM +0200, Henrik Stoerner wrote:
> > However, my main system for this currently has about 20.000 RRD files,
> > all of which are updated every 5 minutes. So that's about 70 updates
> > per second, and I can see that the amount of disk I/O happening on
> > this server will become a performance problem soon, as more systems are
> > added and hence more RRD files need updating.
> The situation I was trying to solve involved a constant stream of high
> resolution data across a large set of records, and relatively infrequent
> viewing of that data. It sounds like you're trying to do something
> similar. Honestly if all you care about is databasing it would probably be
> easier to ditch RRD and use something else or write your own db which is
> more efficient, but at the end of the day (for me anyways :P) rrdtool does
> the best job of producing pretty pictures that don't look like they came
> off of gnuplot or my EKG, and I'm in no mood to become a graphics person
> and re-invent the wheel.
I would be very sad to drop RRDtool, for those very reasons. It is the
de-facto standard for storing time-series based data on Unix, and there
are so many neat utilities around for working with RRD files.
> So, probably your biggest issue is indeed thrashing the hell out of the
> disk if you just tried to naively fire off a pile of forks and hope it all
> works out for the best. [snip]
> Obviously a syscall to exec a shell to run the rrdtool binary every time
> scales to about nothing, and the API (if you can even call it that, I
> don't think (argc, argv) counts :P) to rrdtool functions in C really and
> truly bites. If your application is in C, and you can link directly to the
> librrd, thats a quick and dirty fix for at least some of the evils.
That is basically what I do.
The fork()/exec() calls have been eliminated, since Hobbit uses a module
which calls into directly into the rrdtool library API. So I am calling
the rrd_update() function directly. (Whew - wouldn't even dare to think
how much more overhead it would be to do the updates via the rrdtool
> The big daddy of performance suck is then going to be, opening, closing,
> and seeking the right spot in the files every time.
I can see you've been through many of the same deliberations as I have,
and come to just about the same conclusions. More spindles would help,
but only up to a point. Using RAM disks and keeping a cache of open file
handles is not going to work with the amount of data I have, unfortunately.
Consolidating datapoints into fewer files is a possibility, but at the
cost of making the code doing updates more complex - it is not
guaranteed that all of the data-updates will be available simultaneously.
> Or hell you could always just throw more spindles at it or throw a few
> more $500 linux PCs at it, what do I care. :)
Throwing cheap PC's at the problem is kind of what I was thinking of :-)
I'd like to spread the RRD files across a number of cheap servers,
but in a way that makes it easy to add more servers if it becomes
Anyway, thanks for your comments. They assure me there isn't some
obvious solution that I've missed.
Unsubscribe mailto:rrd-developers-request at list.ee.ethz.ch?subject=unsubscribe
Help mailto:rrd-developers-request at list.ee.ethz.ch?subject=help
More information about the rrd-developers