[rrd-developers] rrdcached performance with >200k nodes

Wed Jan 13 17:23:47 CET 2010

On Wed, Jan 13, 2010 at 5:21 PM, Mirek Lauš <mirek.laus at gmail.com> wrote:
> Kevin,
>
> On Wed, Jan 13, 2010 at 4:52 PM, kevin brintnall <kbrint at rufus.net> wrote:
>>> >> Hello list,
>>> >>
>>> >> we've probably reached rrdcached limits in our monitoring system
>>> >>
>>> >> We had a very nicely running rrdcached while collecting from about 400 hosts,
>>> >> about 100k nodes (RRD files).
>>> >>
>>> >> We've bumped the number of host to about 2000 hosts for interface
>>> >> traffic, errors, unicast and multicast packets with collector of our
>>> >> own. It does batch the RRD updates using rrdcached's BATCH via unix
>>> >> socket. This collector is able to walk
>>> >> all the hosts in less than 5 minutes. The number of nodes is about 200k.
>>> >>
>>> >> The rrdcached is configured to -w 3600 -z 3600 -f 7200 -t 8. Everything runs
>>> >> smoothly until first timeout. Then the Queue value rises up to the
>>> >> number of nodes
>>> >> and keeps that high. Write rate is very low, disk IO is almost zero.
>>> >> CPU load done by rrdcached gets very high (100-200%).
>>> >>
>>> >> The system is FreeBSD 7.2-p4, amd64 with 16GB RAM, RAID10 disk array.
>>> >> rrdtool 1.4.2.
>>> >>
>>> >> Could it be we've reached rrdcached's limits? What can be done about it?
>>
>> Hi Mirek,
>>
>> I'm running a very similar setup to yours: FreeBSD 7/amd64, ~270k nodes, 5
>> minute interval.  I am using '-w 21600 -z 21600 -f 86400', and my
>> rrdcached is steady at ~1.5G RSS.
>>
>> Ideally you would cache at least one full page of writes per RRD file.
>> So, your ideal "-w" timer would be at least:
>>
>>        (RRD step interval)*(page size)/(RRD row size).
>>
>> I'm guessing at least part of your problem is IO limitations.  As Florian
>> said, this workload will see most of the disk's time used up seeking,
>> rather than writing. (try watching "gstat").
>>
>> As for the CPU, it's possible we have some problem that only exhibits
>> itself when there is a large queue.  However, I've never run into this.
>> We'll nave to narrow the problem down a little more.
>>
>> When it's exhibiting this high CPU problem, does it continue to write to
>> the journal?  Are there an abnormal number of "FLUSH" or "WROTE" entries
>> at that time?
>>
>> What do you mean by "until the first timeout"?
>>
>> P.S. I also use these sysctl values, FWIW, YMMV:
>>
>> vfs.ufs.dirhash_maxmem=16777216 # from 2097152
>> vfs.hirunningspace=4194304      # from 1048576
>>
>> --
>>  kevin brintnall =~ /kbrint at rufus.net/
>>
>
> there is about 224k nodes in the tree, after issuing FLUSHALL
> it takes about 20 minuts (with -t 8) to write almost all nodes
>
> we're now stuck at approx 10k nodes in queue,
> journal continues to write, updates are received at normal rate
>
> gstat:
>
> # gstat -b
> dT: 1.000s  w: 1.000s
>  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
>    0      0      0      0    0.0      0      0    0.0    0.0  acd0
>    0      3      0      0    0.0      3     48    5.0    1.5  aacd0
>    0      3      0      0    0.0      3     48    5.1    1.5  aacd0s1
>    0      3      0      0    0.0      3     48    5.1    1.5  aacd0s1a
>    0      0      0      0    0.0      0      0    0.0    0.0  aacd0s1b
>    0      0      0      0    0.0      0      0    0.0    0.0  aacd0s1c
>
> top:
>
> last pid: 28506;  load averages:  9.93,  6.60,  4.37
>
>                              up 50+19:00:19  17:20:43
> 271 processes: 9 running, 243 sleeping, 19 zombie
> CPU 0:     % user,     % nice,     % system,     % interrupt,     % idle
> CPU 1:     % user,     % nice,     % system,     % interrupt,     % idle
> CPU 2:     % user,     % nice,     % system,     % interrupt,     % idle
> CPU 3:     % user,     % nice,     % system,     % interrupt,     % idle
> Mem: 1121M Active, 14G Inact, 805M Wired, 320M Cache, 399M Buf, 490M Free
> Swap: 8192M Total, 100K Used, 8192M Free
>
>  PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME    CPU COMMAND
> 91510 portax     60  44    0   133M 90052K select 3   0:00 102.10% rrdcached
>
> Regards,
> Mirek
>

and yes - I also have changed vfs parameters a bit:

# sysctl -a | egrep "(hirunning|dirhash)"
vfs.ufs.dirhash_docheck: 0
vfs.ufs.dirhash_mem: 11031892
vfs.ufs.dirhash_maxmem: 33554432
vfs.ufs.dirhash_minsize: 2560
vfs.hirunningspace: 4194304

-ml