[rrd-users] rrdcached issues with larger number of clients via network/pthread

Ulf Zimmermann ulf at openlane.com
Sun Oct 31 07:31:31 CET 2010


I guess I might be running into memory :-(

16864 collectd  15   0 3054m 149m  780 S 20.3  0.9 382:06.91 rrdcached                                                       

Need to build a current list of machines again, stop it, then ssh to all machines and restart collectd.


> -----Original Message-----
> From: Steve Shipway [mailto:s.shipway at auckland.ac.nz]
> Sent: Saturday, October 30, 2010 11:17 PM
> To: Ulf Zimmermann; 'rrd-users at lists.oetiker.ch'
> Subject: RE: rrdcached issues with larger number of clients via
> network/pthread
> 
> We also had this problem, using the trunk version of 1.4.4, but I
> thought it was due to a separate memory leak issue (now fixed in
> trunk).
> 
> If pthread_create fails then you're out of resources, possibly memory
> or threads... check your system thread/process limits are not causing
> issues, and that you have sufficient memory.  Keep an eye on how much
> memory rrdcached is using and see if it starts getting unfeasibly
> larger than it is after the first hour.  I've also found that the MRTG
> daemon (which is writing to rrdcached) can become confused and require
> restarting at this point, too.
> 
> Steve
> 
> Steve Shipway
> University of Auckland ITS
> UNIX Systems Design Lead
> s.shipway at auckland.ac.nz
> Ph: +64 9 373 7599 ext 86487
> 
> 
> ________________________________________
> From: rrd-users-bounces+s.shipway=auckland.ac.nz at lists.oetiker.ch [rrd-
> users-bounces+s.shipway=auckland.ac.nz at lists.oetiker.ch] on behalf of
> Ulf Zimmermann [ulf at openlane.com]
> Sent: Sunday, 31 October 2010 6:33 p.m.
> To: 'rrd-users at lists.oetiker.ch'
> Subject: [rrd-users] rrdcached issues with larger number of clients via
> network/pthread
> 
> I got close to 300 machines running collectd, configured to use
> unixsocks to rrdcached on a central server. We are running more and
> more into threads dieing (collectd then starts complaining and fills up
> /var/messages) and when we try to restart collectd, sometimes it works,
> sometimes we end up with:
> 
> Oct 30 22:27:19 log02 rrdcached[16864]: listen_thread_main:
> pthread_create failed.
> Oct 30 22:27:34 log02 rrdcached[16864]: listen_thread_main:
> pthread_create failed.
> Oct 30 22:28:10 log02 rrdcached[16864]: listen_thread_main:
> pthread_create failed.
> 
> And at this point we usual have to restart the rrdcached daemon, which
> then means having to restart collectd on close to 300 machines.
> 
> How can this be debugged to find the issue (potential inside of
> pthreads). The central server is running RedHat EL5 Update 4, the
> rrdtool/rrdcached is 1.4.4 from rpmforge.
> 
> Ulf, who is getting more grey hair by the minute with issues like this
> :-(
> 
> _______________________________________________
> rrd-users mailing list
> rrd-users at lists.oetiker.ch
> https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users



More information about the rrd-users mailing list