[rrd-developers] patch/tuning for very large RRD systems (was "Re: SQL Backend request")
plonka at doit.wisc.edu
Thu May 24 20:43:56 CEST 2007
On Thu, May 24, 2007 at 12:51:11PM -0400, Mark Plaksin wrote:
> Dave Plonka <plonka at doit.wisc.edu> writes:
> > Archit Gupta, Dale Carder, and I have just comleted a research project
> > on RRD performance and scalability. I beleive we may have the largest
> > single system MRTG w/RRD - over 300,000 targets and RRD files updated
> > every five minutes.
> Wow! Would you describe the hardware you are running that on? CPU,
> RAM, disk, and anything else you think is relevant?
Processors: 8 x Intel Xeon @ 2.7GHz
Processor Cache: 2 MB
Memory: 16 GB
Disk: SAN, RAID-10, 16 x 2 disks
Operating System: Linux 2.6.9
File System: ext3 and ext2, 4KB blocksize
I/O Scheduler: Deadline
I've attached a list with our other configuration recommendations.
While our system is certainly generously sized, it is a 3-year old
machine. Note however, that such a configuration couldn't even do 100K
RRD files without the patch to fadvise RANDOM to suppress readahead.
I believe any post-2.6.5 Linux has the posix_fadvise behavior that
the patch leverages.
Also, Tobi has integrated the patch in the code he's testing.
> We have about 45k RRDs and our testing so far says the fadvise changes
> are very nice--thanks! We're also testing local disk (via cciss driver)
> vs SAN storage. Our current RRD server is pretty crushed io-wise. So
> far the SAN storage looks like a big win too.
You should be able to use sar to determine that your reads (for
rrdtool) are much lower than your writes and that the CPU is not
spending too much time in I/O wait state. These are good indications
that (a) unnecessary readahead has been suppressed and (b) that the
buffer-cache is being used effectively.
I've also released a command called fincore ("File IN CORE") that you
can use to examine the buffer-cache to determine that the RRD files
(or any files) are cached as expected:
plonka at doit.wisc.edu http://net.doit.wisc.edu/~plonka/ Madison, WI
-------------- next part --------------
Performance Recommendations for RRD and MRTG Systems
* When building a very large RRD measurement system, dedicate the
machine to this purpose. Since RRD is a file-based database,
it relies on the buffer-cache that is shared across all system
activity. Because of RRD's unique file-access characteristics
and buffering requirements, it is easier to achieve performance
gains by tuning the system just for RRD.
* Use an RRDTool that has our fadvise RANDOM patch. On systems
that have a fairly aggressive initial readahead (such as Linux),
this will very likely increase file update performance by reducing
the page fault rate and the buffer-cache memory required.
* Avoid file-level backups of RRD files unless the set of RRD files
complete fit into buffer-cache memory. File-level backups read
each modified file completely and sequentially; this can fill
the buffer-cache and subsequently causes more page faults on RRD
updates. Backups are essentially indifferentiable from application
access, and thus unnecessarily populate the system's buffer-cache
with content that won't be re-used soon. (Note that backup
programs could call fadvise NOREUSE or fadvise DONTNEED to inform
the operating system that the file content will not be re-used.)
* Split MRTG targets into a number of groups and run a separate
daemon for each. In our system, we reconfigure daily and run a
target_splitter script to produce an new set of ``.cfg'' files each
with approximately 10,000 targets per MRTG daemon. Note that polling
performance is also influenced by the SNMP agent performance on the
network device polled. So, if the splitting results in grouping
like targets together based on the model of device monitored,
there could be quite a disparity in time to complete the MRTG
``poll targets'' phase.
* Do not create RRD files all at once. By staggering the start
times, updates to like RRA updates will cross block boundaries
at different times, distributing the page faults that occur on
block boundary crossings. As a network is deployed and grows,
these RRD file start times would naturally be staggered, but this
could be quite different when introducing measurement to an existing
* Run a caching resolver or a nameserver on the localhost, i.e. the
MRTG system itself. This reduces ``poll targets'' latency
due to host name resolution; MRTG performs very many DNS name
resolutions when hostnames are used (rather than IP addresses)
in target definitions.
* Configure an appropriate number of forks for each MRTG daemon to
minimize the time for the ``poll targets'' phase. On our system,
4 forks per daemon works well to keep polling in the tens of seconds
for 10,000 targets. This might differ for a wide-area network.
* Place RRD files in a file-system of their own, ideally one
associated with separate logical volumes or disks. This gives the
system administrator flexibility to change mount options or other
file-system options. It also isolates the system activity data
(e.g. as displayed by sar) from unrelated activity.
* Consider mounting the file-system that contains the RRD files
with the ``noatime'' and ``nodiratime'' options so that RRD file
reads do not require an update to the file inode block. Of course
the effect of this is that file access times will be inaccurate,
but often these are not of interest for ``.rrd'' files.
* Consider enabling dir_index on ext file-systems to speed up lookups
in large directories. MRTG places all RRD files in the same directory,
and we've scaled to hundreds of thousands.
More information about the rrd-developers