[rrd-developers] Shutdown race-condition in rrdcached

Tobias Oetiker tobi at oetiker.ch
Mon Mar 5 17:49:13 CET 2012


Hi Christian,

thanks!
tobi

Today Christian Hitz wrote:

> Hi all,
>
> we have noticed a race-condition during the shutdown of rrdcached which causes
> rrdcached to hang indefinitely. This happens when rrdcached receives SIGTERMS
> while it is already flushing it's cached data to the RRDs.
>
> We can persistently reproduce the situation with executing:
>
> 	for i in $(seq 20); do killall rrdcached; done
>
> The log output of is the following:
>
> 	Mar  5 16:55:34 rrdcached[1090]: caught SIGTERM
> 	Mar  5 16:55:34 rrdcached[1090]: starting shutdown
> 	Mar  5 16:55:34 rrdcached[1090]: caught SIGTERM
> 	Mar  5 16:56:41 rrdcached[1090]: last message repeated 19 times
>
> In this state, rrdcached can only be terminated with "kill -9".
>
> This seems to be caused be a race-condition between flush_tread and the signal
> handler. Both change the state variable that queue_thread tests as exit condition.
> Applying the following patch seems to fix the described behavior: rrdcached
> correctly flushes and shuts down cleanly.
>
> Index: src/rrd_daemon.c
> ===================================================================
> --- src/rrd_daemon.c	(revision 2281)
> +++ src/rrd_daemon.c	(working copy)
> @@ -295,7 +295,9 @@
>  static void sig_common (const char *sig) /* {{{ */
>  {
>    RRDD_LOG(LOG_NOTICE, "caught SIG%s", sig);
> -  state = FLUSHING;
> +  if (state == RUNNING) {
> +      state = FLUSHING;
> +  }
>    pthread_cond_broadcast(&flush_cond);
>    pthread_cond_broadcast(&queue_cond);
>  } /* }}} void sig_common */
>
> Regards,
> Christian
>
>

-- 
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch tobi at oetiker.ch ++41 62 775 9902 / sb: -9900



More information about the rrd-developers mailing list