[rrd-developers] Shutdown race-condition in rrdcached
Tobias Oetiker
tobi at oetiker.ch
Mon Mar 5 17:49:13 CET 2012
Hi Christian,
thanks!
tobi
Today Christian Hitz wrote:
> Hi all,
>
> we have noticed a race-condition during the shutdown of rrdcached which causes
> rrdcached to hang indefinitely. This happens when rrdcached receives SIGTERMS
> while it is already flushing it's cached data to the RRDs.
>
> We can persistently reproduce the situation with executing:
>
> for i in $(seq 20); do killall rrdcached; done
>
> The log output of is the following:
>
> Mar 5 16:55:34 rrdcached[1090]: caught SIGTERM
> Mar 5 16:55:34 rrdcached[1090]: starting shutdown
> Mar 5 16:55:34 rrdcached[1090]: caught SIGTERM
> Mar 5 16:56:41 rrdcached[1090]: last message repeated 19 times
>
> In this state, rrdcached can only be terminated with "kill -9".
>
> This seems to be caused be a race-condition between flush_tread and the signal
> handler. Both change the state variable that queue_thread tests as exit condition.
> Applying the following patch seems to fix the described behavior: rrdcached
> correctly flushes and shuts down cleanly.
>
> Index: src/rrd_daemon.c
> ===================================================================
> --- src/rrd_daemon.c (revision 2281)
> +++ src/rrd_daemon.c (working copy)
> @@ -295,7 +295,9 @@
> static void sig_common (const char *sig) /* {{{ */
> {
> RRDD_LOG(LOG_NOTICE, "caught SIG%s", sig);
> - state = FLUSHING;
> + if (state == RUNNING) {
> + state = FLUSHING;
> + }
> pthread_cond_broadcast(&flush_cond);
> pthread_cond_broadcast(&queue_cond);
> } /* }}} void sig_common */
>
> Regards,
> Christian
>
>
--
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch tobi at oetiker.ch ++41 62 775 9902 / sb: -9900
More information about the rrd-developers
mailing list