[rrd-developers] Improving RRD tool scalability
sasha at avalon-net.co.il
Mon Mar 3 20:13:23 MET 2003
Recently we have encountered some "intresting" problems while using
RRDtool derivative for large scale data collection.
RRDtool -- based on rrdtool 1.0.28 with several modifications. Bug fixes,
export to database, percentile, STDDEV, moving average function, ability
to use RPN without producing graphs,millisecond resolution
But rrd_update function is essentially the same.
22000 Interfaces. Each interface has 10 datasources (in/out
octets,packets,errors,discards,avail,queue length) Each interface is
stored in separate RRD file. RRD files have custom resolution. Most have
180s step, 1/3 of them have 30s step. About 4M disk space per file. ~100GB
total disk space.
System runs on Sun V880 with 4Ultra III CPU, 8 GB RAM and 6 disk IBM
Data collection is done with our own frontend. The frontend is a major
of Cricket with lots of cool stuff. The data collection can be done either
several processes (~20) or with smaller number of processes(3-5) with
SNMP slaves. Usual turnaround time is 120-300 seconds for all the
You can get our version of rrdtool at http://percival.sourceforge.net
At about 17K interfaces we found that collection has slowed to the crawl.
time for all interfaces become much more then required 300s. Further
revealed that we are spending most time waiting for disk. After performing
Solaris tunes such as verified DNLC cache, inode cache, etc etc... All to no
After the source review we found following problems:
- for every update we have to open then close file.
- We have to read metadata information. (static head plus dynamic
- rrdtool using buffered stdio functions. However there is absolutely no
for the buffering since the io is random. Also solaris does not support
more then 255 open files using stdio functions.
- Number of seeks and writes per each update can be drastically reduced.
We tried and tested several approaches on both Solaris and Linux. In every
file opened only once and closed upon collector exit.
- Improved read/write. Metadata are read once upon file open. pwrite() is
to write data back to file. We tested that pwrite is faster then lseek() and
- Improved read/write with metadata mmap'ed. We managed to get it working
on Linux. Performance wise this solution is about 20-30% faster then pure
- Fully mmaped file. This proved to be the worst possible idea. Again this
tried on Linux only. Possible reason is that msync sync full page which
is 4K while pwrite can only write 512 byte sector. This was confirmed with
In the end we have upgraded RRDtool, upped number of available descriptors
and the problem magically went away. Our estimation that we can handle
about 30-40K of interfaces on the same hardware.
The bottom line is that RRDtool produces a lot of random io and the
time is bound by disk average seek time multiplied by number of interfaces.
modification reduced number of seeks by several times but it did not
In my opinion further advance in speed will require modification of RRD
Also I am very surprised that SNMP collection/CPU usage did not become a
before the disk. According to RTG article this was supposed to be a major
On the other hand our collector has about the same performance as RTG even
though it is written in perl.
P.S. Note that our archive size is about 10-20 times bigger then with MRTG
default because we store more data at higher precision.
Avalon Net Ltd, CTO
Unsubscribe mailto:rrd-developers-request at list.ee.ethz.ch?subject=unsubscribe
Help mailto:rrd-developers-request at list.ee.ethz.ch?subject=help
More information about the rrd-developers