[rrd-users] Scaling rrd tables for best performance
Eduardo M. Bragatto
eduardo at bragatto.com
Thu Dec 6 04:36:31 CET 2007
Hi there...
I'm about to start using rrd to measure several aspects of a few hundred
servers. Just as an example, every server will have it's cpu (idle,
kernel, user, iowait, etc), it's memory (used real, buffers, cache,
unused real, swap, total real), it's partitions space (two DSes per
partition: used and free) and a few other interesting values being
monitored. By the end of the day, one server alone will have from 25 to
30 different DSes.
I can have each server information store on a "server.rrd" file, like:
server1.rrd, server2.rrd, etc...
Or have it split among several rrd files for the same server, like:
server1_cpu.rrd, server1_mem.rrd, server1_network.rrd,
server1_generic.rrd, server2_cpu.rrd, etc...
I'm going to start with something around 500-600 servers but I'm
expecting it to grow to a few thousands for the next year and I would
like to have things scaled for that growth.
I have read (somewhere I don't remember, the source may not be reliable)
that rrdtool caches information in memory to speed up the real-time
calculations, but I don't yet understand how it would be possible
between two different measurements, since the "rrdupdate" process is not
a daemon that would stay loaded in memory all the time, but would be
called at every step. It gives me the impression that at every system
call to rrdupdate, it would copy all data from disk to memory, do all
calculations and then flush data back to disk (causing some disk
activity, but that's understandable and desirable if you want make sure
that the fewest data possible would be lost in case of a system crash).
Everything I'm saying is pretty much a guess based on the documentation
(that doesn't go that far into the rrdtool internals for obvious reasons).
So, my main questions are:
How does rrdtool handles the data IO operations between disk and memory?
Is my understanding close to how things works or am I completely wrong
about it?
Also, which of those two options would give me best performance for
real-time monitoring? Having multiple rrd files for each aspect of each
host (reads having DSes from the same server on different files), or
having a single rrd file with all aspects from a given host (reads
having all DSes on the same file)?
Regards,
Eduardo M. Bragatto.
More information about the rrd-users
mailing list