[rrd-developers] rrdcached performance with >200k nodes

Wed Jan 13 17:39:35 CET 2010

On Wed, Jan 13, 2010 at 5:33 PM, kevin brintnall <kbrint at rufus.net> wrote:
> On Wed, Jan 13, 2010 at 05:21:57PM +0100, Mirek Lau?? wrote:
>> there is about 224k nodes in the tree, after issuing FLUSHALL
>> it takes about 20 minuts (with -t 8) to write almost all nodes
>
> This sounds about right.
>
>> we're now stuck at approx 10k nodes in queue,
>> journal continues to write, updates are received at normal rate
>
> Try issuing the "QUEUE" command to the daemon when in this state.  I just
> want to verify that you are seeing files continually added to the end of
> the queue, and being removed from the front.
>
>>   PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME    CPU COMMAND
>> 91510 portax     60  44    0   133M 90052K select 3   0:00 102.10% rrdcached
>
> What does 'top -H' show for this pid?  Is the CPU spread out evenly or
> dominated by one thread?  Since it's so close to 100%, I'm guessing one
> thread.  It would be nice to determine whether it's the thread that's
> handling your UPDATE (vs. the queue threads).

It's spread:

last pid: 57217;  load averages:  9.37,  9.09,  7.77

                              up 50+19:18:56  17:39:20
271 processes: 13 running, 239 sleeping, 19 zombie
CPU 0:     % user,     % nice,     % system,     % interrupt,     % idle
CPU 1:     % user,     % nice,     % system,     % interrupt,     % idle
CPU 2:     % user,     % nice,     % system,     % interrupt,     % idle
CPU 3:     % user,     % nice,     % system,     % interrupt,     % idle
Mem: 1270M Active, 14G Inact, 811M Wired, 324M Cache, 399M Buf, 244M Free
Swap: 8192M Total, 100K Used, 8192M Free

  PID USERNAME PRI NICE   SIZE    RES STATE  C   TIME    CPU COMMAND
91510 portax    99    0   156M   111M RUN    1  10:30 15.19%
/usr/local/bin/rrdcached -l unix:/tmp/rrdcached.sock -w 3600
91510 portax    99    0   156M   111M RUN    1  10:29 14.89%
/usr/local/bin/rrdcached -l unix:/tmp/rrdcached.sock -w 3600
91510 portax    99    0   156M   111M RUN    3  10:31 14.79%
/usr/local/bin/rrdcached -l unix:/tmp/rrdcached.sock -w 3600
91510 portax    99    0   156M   111M RUN    1  10:32 14.70%
/usr/local/bin/rrdcached -l unix:/tmp/rrdcached.sock -w 3600
91510 portax    99    0   156M   111M RUN    1  10:27 14.70%
/usr/local/bin/rrdcached -l unix:/tmp/rrdcached.sock -w 3600
91510 portax    99    0   156M   111M RUN    1  10:29 14.36%
/usr/local/bin/rrdcached -l unix:/tmp/rrdcached.sock -w 3600
91510 portax    99    0   156M   111M RUN    0  10:32 13.96%
/usr/local/bin/rrdcached -l unix:/tmp/rrdcached.sock -w 3600
91510 portax    99    0   156M   111M RUN    2  10:29 13.77%
/usr/local/bin/rrdcached -l unix:/tmp/rrdcached.sock -w 3600

>
> Also, can you repeat the earlier 'callgrind' output, but with --tree=both ?

Okay, will do so.

Look at the attached rrdcached stats.

Regards,
-ml
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cachedstats.png
Type: image/png
Size: 159482 bytes
Desc: not available
Url : http://lists.oetiker.ch/pipermail/rrd-developers/attachments/20100113/401728ae/attachment-0001.png