[rrd-developers] Shutdown race-condition in rrdcached

Christian Hitz christian.hitz at aizo.com
Mon Mar 5 17:23:09 CET 2012


Hi all,

we have noticed a race-condition during the shutdown of rrdcached which causes
rrdcached to hang indefinitely. This happens when rrdcached receives SIGTERMS
while it is already flushing it's cached data to the RRDs.

We can persistently reproduce the situation with executing:

	for i in $(seq 20); do killall rrdcached; done

The log output of is the following:

	Mar  5 16:55:34 rrdcached[1090]: caught SIGTERM
	Mar  5 16:55:34 rrdcached[1090]: starting shutdown
	Mar  5 16:55:34 rrdcached[1090]: caught SIGTERM
	Mar  5 16:56:41 rrdcached[1090]: last message repeated 19 times

In this state, rrdcached can only be terminated with "kill -9".

This seems to be caused be a race-condition between flush_tread and the signal
handler. Both change the state variable that queue_thread tests as exit condition.
Applying the following patch seems to fix the described behavior: rrdcached
correctly flushes and shuts down cleanly.

Index: src/rrd_daemon.c
===================================================================
--- src/rrd_daemon.c	(revision 2281)
+++ src/rrd_daemon.c	(working copy)
@@ -295,7 +295,9 @@
 static void sig_common (const char *sig) /* {{{ */
 {
   RRDD_LOG(LOG_NOTICE, "caught SIG%s", sig);
-  state = FLUSHING;
+  if (state == RUNNING) {
+      state = FLUSHING;
+  }
   pthread_cond_broadcast(&flush_cond);
   pthread_cond_broadcast(&queue_cond);
 } /* }}} void sig_common */

Regards,
Christian

-- 
Christian Hitz
aizo ag, Schlieren, Switzerland, www.aizo.com



More information about the rrd-developers mailing list