[rrd-users] rrdcached issues with larger number of clients via network/pthread

Sun Nov 21 10:27:57 CET 2010

I had this same memory problem and error message a while back after 4 days of running, but had thought it to be due to a couple of small memory leaks in the branch code (since fixed).

Can you indicate which rrdcached functions you are using -- ie, is it just used for update, or are you also using other functions like last, create, info, etc on a regular (not necessarily frequent) basis?  This would help to track down problems.

Another possibility is that the number of active threads has hit 1024 (PTHREAD_THREADS_MAX -- this can be increased only by recompiling the kernel).  I don't have enough intimate knowledge of rrdcached to tell if it is possible for it to be 'leaking' threads; I suppose that since you have a separate thread for each active client connection, plus the write threads, a large number of clients might cause this to be reached?  To tell if this is it, use 'ps -L -p <rrdcached PID>' and count the number of threads for the rrdcached process.  For comparison, we have 15 on our server, and it has been running (with 1.4.trunk) for more than a week now with over 50 updates per second.

A separate issue is that, from what I can tell of the code, the rrd client is supposed to attempt a re-connect to the daemon in the event of the remote daemon restarting and the connection dying.  However it does seem that this doesn't necessarily happen -- I've had to restart the MRTG daemon, and you apparently need to restart collectd when the rrdcached is restarted.

Steve

Steve Shipway
University of Auckland ITS
UNIX Systems Design Lead
s.shipway at auckland.ac.nz
Ph: +64 9 373 7599 ext 86487