[rrd-developers] Re: How to get the most performance when using lots of RRD files

Thu Aug 17 09:45:21 MEST 2006

Tarus Balog <tarus at opennms.org> writes:
> Ole Bjørn Hessen wrote:
> 
> > Another solution is to make the RRD-files smaller :-)
> 
> We found the exact opposite to be true. We have a couple of clients who
> are updating 500K+ values every five minutes. We used to store each
> value in its own RRD file, but found that if we group them together
> things run much faster.

Sorry about that mucky thinking on my part. I was thinking about
reading (graphing) speed. When you group together values, the RRD-files
get huger. An example I have is a 6.5Mb RRD-file (8 interfaces with 8
counters pr. interface); when showing a graph, RRD read 2.1 Mb of that
file. Compare this with a 300k file where you have to read only perhaps
100k. The performance on graphing the file will be decreased
ten-fold. Reducing your RRAs will reduce your RRD-file and increase
your reading speed. You have to balance the size of the RRD-files with
what is your customer requirements on graphing speed. 

In short there is no such thing as a free lunch :-)

> The final thing that we did was abstract data collection from data
> storage. Things would be going fine, and then I'd do something like a
> "find" to see how many files we were updating, and bam, chaos, the
> raptors were out, everything went sour.
> 
> By adding a queue to store values waiting to be written, we were able to
> absorb such file system shocks. It takes just about as much time to
> write two sets of values as one (most of the overhead is in the headers)
> and so we can catch up.

An example: The usual update of a RRD-file on 6.5Mb involves an reading
and writing 50Kb at the start of the file, and an update of a single
block into the file. In a smart system, the 50Kb will be in the
read-cache (RAM), and writing of the 50Kb will be compared with the
read-cache and perhaps only the dirty blocks in the updated header will
be flushed onto disk. 

When the count of RRD-files increases, your performance will
increasingly depend upon every aspect of how your system is set up -
what OS you are using, how much RAM do you have, what kind of disks you
are using, the network speed, what version of NetApp you're using, how
is your RAID setup. And if you have a other users of this system you
basically have a 'unstable' system.

I see a couple of ways out of this complexity problem:

1. Rewrite RRD-tool to separate the database from the graphing and wait
   for someone to come up with a much smarter database method ;-)

2. Wait for cheap ram-disks so that read/write speed will not matter
   any more.

Ole Bjørn Hessen,
NMS-IP, PF-Nett, Telenor Networks

--
Unsubscribe mailto:rrd-developers-request at list.ee.ethz.ch?subject=unsubscribe
Help        mailto:rrd-developers-request at list.ee.ethz.ch?subject=help
Archive     http://lists.ee.ethz.ch/rrd-developers
WebAdmin    http://lists.ee.ethz.ch/lsg2.cgi