[rrd-users] rrdcached issues with larger number of clients via network/pthread

Ulf Zimmermann ulf at openlane.com
Sun Nov 21 10:53:25 CET 2010


I use it via collectd and that should only be doing update. Graphing happens through rrdtool itself, directly on the files. Currently I got 275 connections (as per netstat). It runs as:

collectd  2515     1 20 Nov17 ?        16:50:17 //opt/rrdtool-1.4.4.002147/bin/rrdcached -p /var/rrdtool/rrdcached/rrdcached.pid -w 600 -z 300 -l 10.21.0.43 -p /data/rrdcached/run/rrdcached.pid -l /data/rrdcached/run/rrdcached.sock -j /data/rrdcached/journal -b /data/rrdcached/data

Top shows it as:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                        
 2515 collectd  15   0 3000m 182m  896 S 19.9  1.1   1010:22 rrdcached                                                      

Virt is currently bouncing between 2997 and 3000. It was initial around 2,776 I think after I started the newly compiled rrdcached and then restarted all the collectd instances (I need to get something in place which does that automatic).

The last few times I have looked it ran out of memory as far I can, failing to create new pthread or failed on mmaping:

Nov 17 13:37:40 log02 rrdcached[21009]: listen_thread_main: pthread_create failed.
Nov 17 13:39:04 log02 rrdcached[21009]: queue_thread_main: rrd_update_r (/data/rrdcached/data/co-db02.autc.com/disk-cciss_c0d2/disk_time.rrd) failed with status -1. (mmaping file '/data/rrdcached/data/co-db02.autc.com/disk-cciss_c0d2/disk_time.rrd': Cannot allocate memory)
Nov 17 13:41:34 log02 rrdcached[21009]: queue_thread_main: rrd_update_r (/data/rrdcached/data/co-db02.autc.com/interface/if_octets-sit0.rrd) failed with status -1. (mmaping file '/data/rrdcached/data/co-db02.autc.com/interface/if_octets-sit0.rrd': Cannot allocate memory)
Nov 17 13:47:40 log02 rrdcached[21009]: listen_thread_main: pthread_create failed.

I need to figure out what I can do about moving all this to a 64-bit machine, this is currently just EL5 i386. Initial I was going to install it as 64-bit (machine has 16GB) but due to issues with rrd and different file format between i386 and x86_64, I ended up using i386. Since then I have moved anything either to this machine locally (collectd and some other collectors) or using collectd/rrdcached for remote machines, so I could switch to x86_64, but would have to convert all the files when I do that.

If it weren't also my central syslog server, I would potential just reinstall it.



> -----Original Message-----
> From: Steve Shipway [mailto:s.shipway at auckland.ac.nz]
> Sent: Sunday, November 21, 2010 1:28 AM
> To: Ulf Zimmermann
> Cc: 'rrd-users at lists.oetiker.ch'
> Subject: RE: [rrd-users] rrdcached issues with larger number of clients
> via network/pthread
> 
> I had this same memory problem and error message a while back after 4
> days of running, but had thought it to be due to a couple of small
> memory leaks in the branch code (since fixed).
> 
> Can you indicate which rrdcached functions you are using -- ie, is it
> just used for update, or are you also using other functions like last,
> create, info, etc on a regular (not necessarily frequent) basis?  This
> would help to track down problems.
> 
> Another possibility is that the number of active threads has hit 1024
> (PTHREAD_THREADS_MAX -- this can be increased only by recompiling the
> kernel).  I don't have enough intimate knowledge of rrdcached to tell
> if it is possible for it to be 'leaking' threads; I suppose that since
> you have a separate thread for each active client connection, plus the
> write threads, a large number of clients might cause this to be
> reached?  To tell if this is it, use 'ps -L -p <rrdcached PID>' and
> count the number of threads for the rrdcached process.  For comparison,
> we have 15 on our server, and it has been running (with 1.4.trunk) for
> more than a week now with over 50 updates per second.
> 
> A separate issue is that, from what I can tell of the code, the rrd
> client is supposed to attempt a re-connect to the daemon in the event
> of the remote daemon restarting and the connection dying.  However it
> does seem that this doesn't necessarily happen -- I've had to restart
> the MRTG daemon, and you apparently need to restart collectd when the
> rrdcached is restarted.
> 
> Steve
> 
> Steve Shipway
> University of Auckland ITS
> UNIX Systems Design Lead
> s.shipway at auckland.ac.nz
> Ph: +64 9 373 7599 ext 86487



More information about the rrd-users mailing list