[rrd-developers] rrdcached performance with >200k nodes

Tobias Oetiker tobi at oetiker.ch
Wed Jan 13 09:48:16 CET 2010


Hi Mirek,

Today Mirek Lauš wrote:

> Hello list,
>
> we've probably reached rrdcached limits in our monitoring system
>
> We had a very nicely running rrdcached while collecting from about 400 hosts,
> about 100k nodes (RRD files).
>
> We've bumped the number of host to about 2000 hosts for interface
> traffic, errors, unicast and multicast packets with collector of our
> own. It does batch the RRD updates using rrdcached's BATCH via unix
> socket. This collector is able to walk
> all the hosts in less than 5 minutes. The number of nodes is about 200k.
>
> The rrdcached is configured to -w 3600 -z 3600 -f 7200 -t 8. Everything runs
> smoothly until first timeout. Then the Queue value rises up to the
> number of nodes
> and keeps that high. Write rate is very low, disk IO is almost zero.
> CPU load done by rrdcached gets very high (100-200%).
>
> The system is FreeBSD 7.2-p4, amd64 with 16GB RAM, RAID10 disk array.
> rrdtool 1.4.2.
>
> Could it be we've reached rrdcached's limits? What can be done about it?

I am not running a huge rrdcached setup myself, but what I gether
from other posts is that this is NOT the limit, there must be other
issues at play. Can you do a profiling run to identify the hotspot
in rrdcached ? The limit is reached when the system becomes Disk-IO
bound, if it becomes CPU bound, then there is a bug somewhere.

cheers
tobi

ps. I am moving this to rrd-developers

-- 
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch tobi at oetiker.ch ++41 62 775 9902 / sb: -9900


More information about the rrd-developers mailing list