[rrd-users] rrdcached issues with larger number of clients via network/pthread

Ulf Zimmermann ulf at openlane.com
Sun Oct 31 06:33:16 CET 2010


I got close to 300 machines running collectd, configured to use unixsocks to rrdcached on a central server. We are running more and more into threads dieing (collectd then starts complaining and fills up /var/messages) and when we try to restart collectd, sometimes it works, sometimes we end up with:

Oct 30 22:27:19 log02 rrdcached[16864]: listen_thread_main: pthread_create failed.
Oct 30 22:27:34 log02 rrdcached[16864]: listen_thread_main: pthread_create failed.
Oct 30 22:28:10 log02 rrdcached[16864]: listen_thread_main: pthread_create failed.

And at this point we usual have to restart the rrdcached daemon, which then means having to restart collectd on close to 300 machines.

How can this be debugged to find the issue (potential inside of pthreads). The central server is running RedHat EL5 Update 4, the rrdtool/rrdcached is 1.4.4 from rpmforge.

Ulf, who is getting more grey hair by the minute with issues like this :-(



More information about the rrd-users mailing list