[rrd-users] rrdcached issues with larger number of clients via network/pthread

Sun Nov 21 10:55:14 CET 2010

Oh and the threat count is:

log02 root /home/ulf # ps -L -p 2515 | wc -l
282

So 281 not counting the header of ps.

> -----Original Message-----
> From: rrd-users-bounces+ulf=atc-onlane.com at lists.oetiker.ch
> [mailto:rrd-users-bounces+ulf=atc-onlane.com at lists.oetiker.ch] On
> Behalf Of Ulf Zimmermann
> Sent: Sunday, November 21, 2010 1:53 AM
> To: 'Steve Shipway'
> Cc: 'rrd-users at lists.oetiker.ch'
> Subject: Re: [rrd-users] rrdcached issues with larger number of clients
> via network/pthread
> 
> I use it via collectd and that should only be doing update. Graphing
> happens through rrdtool itself, directly on the files. Currently I got
> 275 connections (as per netstat). It runs as:
> 
> collectd  2515     1 20 Nov17 ?        16:50:17 //opt/rrdtool-
> 1.4.4.002147/bin/rrdcached -p /var/rrdtool/rrdcached/rrdcached.pid -w
> 600 -z 300 -l 10.21.0.43 -p /data/rrdcached/run/rrdcached.pid -l
> /data/rrdcached/run/rrdcached.sock -j /data/rrdcached/journal -b
> /data/rrdcached/data
> 
> Top shows it as:
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  2515 collectd  15   0 3000m 182m  896 S 19.9  1.1   1010:22 rrdcached
> 
> Virt is currently bouncing between 2997 and 3000. It was initial around
> 2,776 I think after I started the newly compiled rrdcached and then
> restarted all the collectd instances (I need to get something in place
> which does that automatic).
> 
> The last few times I have looked it ran out of memory as far I can,
> failing to create new pthread or failed on mmaping:
> 
> Nov 17 13:37:40 log02 rrdcached[21009]: listen_thread_main:
> pthread_create failed.
> Nov 17 13:39:04 log02 rrdcached[21009]: queue_thread_main: rrd_update_r
> (/data/rrdcached/data/co-db02.autc.com/disk-cciss_c0d2/disk_time.rrd)
> failed with status -1. (mmaping file '/data/rrdcached/data/co-
> db02.autc.com/disk-cciss_c0d2/disk_time.rrd': Cannot allocate memory)
> Nov 17 13:41:34 log02 rrdcached[21009]: queue_thread_main: rrd_update_r
> (/data/rrdcached/data/co-db02.autc.com/interface/if_octets-sit0.rrd)
> failed with status -1. (mmaping file '/data/rrdcached/data/co-
> db02.autc.com/interface/if_octets-sit0.rrd': Cannot allocate memory)
> Nov 17 13:47:40 log02 rrdcached[21009]: listen_thread_main:
> pthread_create failed.
> 
> I need to figure out what I can do about moving all this to a 64-bit
> machine, this is currently just EL5 i386. Initial I was going to
> install it as 64-bit (machine has 16GB) but due to issues with rrd and
> different file format between i386 and x86_64, I ended up using i386.
> Since then I have moved anything either to this machine locally
> (collectd and some other collectors) or using collectd/rrdcached for
> remote machines, so I could switch to x86_64, but would have to convert
> all the files when I do that.
> 
> If it weren't also my central syslog server, I would potential just
> reinstall it.
> 
> 
> 
> > -----Original Message-----
> > From: Steve Shipway [mailto:s.shipway at auckland.ac.nz]
> > Sent: Sunday, November 21, 2010 1:28 AM
> > To: Ulf Zimmermann
> > Cc: 'rrd-users at lists.oetiker.ch'
> > Subject: RE: [rrd-users] rrdcached issues with larger number of
> clients
> > via network/pthread
> >
> > I had this same memory problem and error message a while back after 4
> > days of running, but had thought it to be due to a couple of small
> > memory leaks in the branch code (since fixed).
> >
> > Can you indicate which rrdcached functions you are using -- ie, is it
> > just used for update, or are you also using other functions like
> last,
> > create, info, etc on a regular (not necessarily frequent) basis?
> This
> > would help to track down problems.
> >
> > Another possibility is that the number of active threads has hit 1024
> > (PTHREAD_THREADS_MAX -- this can be increased only by recompiling the
> > kernel).  I don't have enough intimate knowledge of rrdcached to tell
> > if it is possible for it to be 'leaking' threads; I suppose that
> since
> > you have a separate thread for each active client connection, plus
> the
> > write threads, a large number of clients might cause this to be
> > reached?  To tell if this is it, use 'ps -L -p <rrdcached PID>' and
> > count the number of threads for the rrdcached process.  For
> comparison,
> > we have 15 on our server, and it has been running (with 1.4.trunk)
> for
> > more than a week now with over 50 updates per second.
> >
> > A separate issue is that, from what I can tell of the code, the rrd
> > client is supposed to attempt a re-connect to the daemon in the event
> > of the remote daemon restarting and the connection dying.  However it
> > does seem that this doesn't necessarily happen -- I've had to restart
> > the MRTG daemon, and you apparently need to restart collectd when the
> > rrdcached is restarted.
> >
> > Steve
> >
> > Steve Shipway
> > University of Auckland ITS
> > UNIX Systems Design Lead
> > s.shipway at auckland.ac.nz
> > Ph: +64 9 373 7599 ext 86487
> 
> _______________________________________________
> rrd-users mailing list
> rrd-users at lists.oetiker.ch
> https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users