[rrd-developers] Shutdown race-condition in rrdcached
Christian Hitz
christian.hitz at aizo.com
Mon Mar 5 17:23:09 CET 2012
Hi all,
we have noticed a race-condition during the shutdown of rrdcached which causes
rrdcached to hang indefinitely. This happens when rrdcached receives SIGTERMS
while it is already flushing it's cached data to the RRDs.
We can persistently reproduce the situation with executing:
for i in $(seq 20); do killall rrdcached; done
The log output of is the following:
Mar 5 16:55:34 rrdcached[1090]: caught SIGTERM
Mar 5 16:55:34 rrdcached[1090]: starting shutdown
Mar 5 16:55:34 rrdcached[1090]: caught SIGTERM
Mar 5 16:56:41 rrdcached[1090]: last message repeated 19 times
In this state, rrdcached can only be terminated with "kill -9".
This seems to be caused be a race-condition between flush_tread and the signal
handler. Both change the state variable that queue_thread tests as exit condition.
Applying the following patch seems to fix the described behavior: rrdcached
correctly flushes and shuts down cleanly.
Index: src/rrd_daemon.c
===================================================================
--- src/rrd_daemon.c (revision 2281)
+++ src/rrd_daemon.c (working copy)
@@ -295,7 +295,9 @@
static void sig_common (const char *sig) /* {{{ */
{
RRDD_LOG(LOG_NOTICE, "caught SIG%s", sig);
- state = FLUSHING;
+ if (state == RUNNING) {
+ state = FLUSHING;
+ }
pthread_cond_broadcast(&flush_cond);
pthread_cond_broadcast(&queue_cond);
} /* }}} void sig_common */
Regards,
Christian
--
Christian Hitz
aizo ag, Schlieren, Switzerland, www.aizo.com
More information about the rrd-developers
mailing list