[rrd-developers] Re: How to get the most performance when using lots of RRD files

Wed Aug 16 13:19:01 MEST 2006

On Wed, Aug 16, 2006 at 08:10:09AM +0200, Henrik Stoerner wrote:
> I am using a network/systems monitoring tool - Hobbit - which uses 
> lots of RRD files for tracking all sorts of data. This works really
> well - kudos to Tobi.
> 
> However, my main system for this currently has about 20.000 RRD files,
> all of which are updated every 5 minutes. So that's about 70 updates
> per second, and I can see that the amount of disk I/O happening on
> this server will become a performance problem soon, as more systems are
> added and hence more RRD files need updating.

I've been in a similar situation myself, doing 20-30 sec updates on 50k+ 
RRD files. The bottom line is that rrdtool is just not designed to do 
that, and it will go kicking and screaming into the night when you try to 
make it. The "typical user" is calling the rrdtool binary from a perl 
script, graphing a few dozen or at worst hundreds of items, and doesn't 
have a care in the world about the internal architecture. 

The situation I was trying to solve involved a constant stream of high 
resolution data across a large set of records, and relatively infrequent 
viewing of that data. It sounds like you're trying to do something 
similar. Honestly if all you care about is databasing it would probably be 
easier to ditch RRD and use something else or write your own db which is 
more efficient, but at the end of the day (for me anyways :P) rrdtool does 
the best job of producing pretty pictures that don't look like they came 
off of gnuplot or my EKG, and I'm in no mood to become a graphics person 
and re-invent the wheel.

So, probably your biggest issue is indeed thrashing the hell out of the 
disk if you just tried to naively fire off a pile of forks and hope it all 
works out for the best. In my application I implemented a data write queue 
and a single thread per disk for dispatching rrd updates, which helps 
quite a bit. It really depends on your polling application as to how easy 
this is though.

Obviously a syscall to exec a shell to run the rrdtool binary every time 
scales to about nothing, and the API (if you can even call it that, I 
don't think (argc, argv) counts :P) to rrdtool functions in C really and 
truly bites. If your application is in C, and you can link directly to the 
librrd, thats a quick and dirty fix for at least some of the evils. What 
really should happen is for that entire section of code to be gutted with 
a vengence, split the text parsing code out of it and send it in the 
direction of the cli frontend, and develop an actual API for passing in 
data in a sensible format for other users who want to link to a C lib. 
This really isn't that difficult to do either.

The big daddy of performance suck is then going to be, opening, closing, 
and seeking the right spot in the files every time. Again, perfectly 
straight forward for very light scripty use, but using .rrd files as an 
indexing method for large datasets scales horribly. One thing you could do 
if you really wanted to scale this db format (since the updated are 
relatively simple compared to the graphing) is to write your own code to 
keep open handles on the files and do your own direct db access. This 
would be fairly effective up to a point, obviously there is a limit to the 
number of files you can keep open on your OS, but by the point you reach 
it you've probably crossed that threshold to where looking at a different 
solution to replace rrd completely is worth your time again. Of course, 
also make sure that your polling app isn't completely braindead, because 
you can do plenty of intelligent aggregation of datasources inside a 
single .rrd file.

One option I explored for doing 10 sec updates was to keep my .rrd files 
in a ram disk, and periodically sync to disk at intervals where you want 
to save long term data (say for example 5 minutes, so you only lose 5 mins 
of data in the event of a failure). Of course the problem I ran into is 
that in addition to doing very high resolution short term data collection 
(it makes for really nice graphs of realtime data, honest :P), I'm storing 
a fair amount of long term data too. This means that it is perfectly 
reasonable for a .rrd file to be large (say 500KB-1MB), but for only a few 
KB of the data per file to actually be touched on any given update 
interval. What you'd really be looking for out of a ram disk there is 
file/disk-backed storage and a really slow periodic flush of dirty blocks 
to disk, which is again probably more work then you should put into a hack 
around rrdtool. Of course if you can afford the ram in the first place to 
make all your data fit, you can just dd a raw image at the block level and 
get much less disk thrashing than accessing tens of thousands of small 
files.

Or hell you could always just throw more spindles at it or throw a few 
more $500 linux PCs at it, what do I care. :)

-- 
Richard A Steenbergen <ras at e-gerbil.net>       http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)

--
Unsubscribe mailto:rrd-developers-request at list.ee.ethz.ch?subject=unsubscribe
Help        mailto:rrd-developers-request at list.ee.ethz.ch?subject=help
Archive     http://lists.ee.ethz.ch/rrd-developers
WebAdmin    http://lists.ee.ethz.ch/lsg2.cgi