[rrd-developers] rrdcached performance with >200k nodes
Mirek Lauš
mirek.laus at gmail.com
Wed Jan 13 17:23:47 CET 2010
On Wed, Jan 13, 2010 at 5:21 PM, Mirek Lauš <mirek.laus at gmail.com> wrote:
> Kevin,
>
> On Wed, Jan 13, 2010 at 4:52 PM, kevin brintnall <kbrint at rufus.net> wrote:
>>> >> Hello list,
>>> >>
>>> >> we've probably reached rrdcached limits in our monitoring system
>>> >>
>>> >> We had a very nicely running rrdcached while collecting from about 400 hosts,
>>> >> about 100k nodes (RRD files).
>>> >>
>>> >> We've bumped the number of host to about 2000 hosts for interface
>>> >> traffic, errors, unicast and multicast packets with collector of our
>>> >> own. It does batch the RRD updates using rrdcached's BATCH via unix
>>> >> socket. This collector is able to walk
>>> >> all the hosts in less than 5 minutes. The number of nodes is about 200k.
>>> >>
>>> >> The rrdcached is configured to -w 3600 -z 3600 -f 7200 -t 8. Everything runs
>>> >> smoothly until first timeout. Then the Queue value rises up to the
>>> >> number of nodes
>>> >> and keeps that high. Write rate is very low, disk IO is almost zero.
>>> >> CPU load done by rrdcached gets very high (100-200%).
>>> >>
>>> >> The system is FreeBSD 7.2-p4, amd64 with 16GB RAM, RAID10 disk array.
>>> >> rrdtool 1.4.2.
>>> >>
>>> >> Could it be we've reached rrdcached's limits? What can be done about it?
>>
>> Hi Mirek,
>>
>> I'm running a very similar setup to yours: FreeBSD 7/amd64, ~270k nodes, 5
>> minute interval. I am using '-w 21600 -z 21600 -f 86400', and my
>> rrdcached is steady at ~1.5G RSS.
>>
>> Ideally you would cache at least one full page of writes per RRD file.
>> So, your ideal "-w" timer would be at least:
>>
>> (RRD step interval)*(page size)/(RRD row size).
>>
>> I'm guessing at least part of your problem is IO limitations. As Florian
>> said, this workload will see most of the disk's time used up seeking,
>> rather than writing. (try watching "gstat").
>>
>> As for the CPU, it's possible we have some problem that only exhibits
>> itself when there is a large queue. However, I've never run into this.
>> We'll nave to narrow the problem down a little more.
>>
>> When it's exhibiting this high CPU problem, does it continue to write to
>> the journal? Are there an abnormal number of "FLUSH" or "WROTE" entries
>> at that time?
>>
>> What do you mean by "until the first timeout"?
>>
>> P.S. I also use these sysctl values, FWIW, YMMV:
>>
>> vfs.ufs.dirhash_maxmem=16777216 # from 2097152
>> vfs.hirunningspace=4194304 # from 1048576
>>
>> --
>> kevin brintnall =~ /kbrint at rufus.net/
>>
>
> there is about 224k nodes in the tree, after issuing FLUSHALL
> it takes about 20 minuts (with -t 8) to write almost all nodes
>
> we're now stuck at approx 10k nodes in queue,
> journal continues to write, updates are received at normal rate
>
> gstat:
>
> # gstat -b
> dT: 1.000s w: 1.000s
> L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
> 0 0 0 0 0.0 0 0 0.0 0.0 acd0
> 0 3 0 0 0.0 3 48 5.0 1.5 aacd0
> 0 3 0 0 0.0 3 48 5.1 1.5 aacd0s1
> 0 3 0 0 0.0 3 48 5.1 1.5 aacd0s1a
> 0 0 0 0 0.0 0 0 0.0 0.0 aacd0s1b
> 0 0 0 0 0.0 0 0 0.0 0.0 aacd0s1c
>
> top:
>
> last pid: 28506; load averages: 9.93, 6.60, 4.37
>
> up 50+19:00:19 17:20:43
> 271 processes: 9 running, 243 sleeping, 19 zombie
> CPU 0: % user, % nice, % system, % interrupt, % idle
> CPU 1: % user, % nice, % system, % interrupt, % idle
> CPU 2: % user, % nice, % system, % interrupt, % idle
> CPU 3: % user, % nice, % system, % interrupt, % idle
> Mem: 1121M Active, 14G Inact, 805M Wired, 320M Cache, 399M Buf, 490M Free
> Swap: 8192M Total, 100K Used, 8192M Free
>
> PID USERNAME THR PRI NICE SIZE RES STATE C TIME CPU COMMAND
> 91510 portax 60 44 0 133M 90052K select 3 0:00 102.10% rrdcached
>
> Regards,
> Mirek
>
and yes - I also have changed vfs parameters a bit:
# sysctl -a | egrep "(hirunning|dirhash)"
vfs.ufs.dirhash_docheck: 0
vfs.ufs.dirhash_mem: 11031892
vfs.ufs.dirhash_maxmem: 33554432
vfs.ufs.dirhash_minsize: 2560
vfs.hirunningspace: 4194304
-ml
More information about the rrd-developers
mailing list