[rrd-developers] rrdcached shutdown

Fri Sep 26 14:29:30 CEST 2008

Today Florian Forster wrote:

> Hi Tobi,
>
> On Thu, Sep 25, 2008 at 11:43:48PM +0200, Tobias Oetiker wrote:
> > the point of using TERM is that as a system goes down, normally
> > processes that are still hanging around are sent TERM and shortly
> > after KILL, so it is a good thing for a process to quickly get ready
> > to die when he gets TERM.
>
> yes, but before any of that SIGTERM/SIGKILL business, the init scripts
> are run. And in the init script you can do something like:
>  . /etc/default/rrdtool
>  if test "$FLUSH_ON_EXIT" -eq 0
>  then
>    kill -USR1 `pidof rrdcached`
>  else
>    kill -TERM `pidof rrdcached`
>  fi
>
> Additionally an init script could provide two stop actions:
>  # /etc/init.d/rrdcached stop
>  # /etc/init.d/rrdcached stop-noflush
> (And appropriate restart actions, of course.)

yes by all means ... and if the script prints 'waiting for
rrdcached while it flushes' then its even better.

> > also when a user does kill PID the process should die and not suddenly
> > start using the disk like mad for 20 minutes ... if it does that the
> > user will send it a kill -KILL and this may not be what we want at all
> > ...
>
> No, when a user does a `kill <pid>', a *daemon* should catch the signal
> and *shut down gracefully*. Don't let the name of the tool misguide you,
> think `sendsignal' instead ;)
[...]

you are also maintaining an opensource project, and you may have
different experiance than me, but I find that users are quite easily
confused, and they do not generally read documentation. So I want
my tools to act in a  way that is conveniant and non-suprising for
the users. Meaning if I am  told to terminate, I terminate,
quickly.

For a machine shutdown without scripts where the process gets sent
kill TERM prior to kill KILL I want it to act sensibly as well.
Sensibly in this context is, shutdown as quickly as possible
without loosing data.

Thirdly speaking as a unix admin, who manages systems where users run
their own daemons: I do not know the right signal for each daemon
to quit. So what I do, if I want a deamon to end, is

 kill <daemon-pid>

and if does not die within reasonalble time it gets

 kill -9 <daemon-pid>

there may be a ton of cool signals for the initiated, to get
rrdcached to sing and dance, sit up and beg even, but when it comes
down to the basics, I see no reason for taking a risk in not
quitting as fast as it reasonably can when it gets a sig TERM.

there could even be a tradeoff where rrdcached first tries to sync
everything back to rrd files and after 3 seconds or so it stops
doing that and just syncs the journal to disk.

But, to repeat myself, if we just hang around doing our thing
(syncing all outstanding IO to rrds) blantantly ignoring the
termination signal we received, then we are not behaving the way I
want us to behave towards the user. The user is our boss, and if he
tells us to terminate we should terminate and not hang around
mumbling about the jobs we want to finish first.

cheers
tobi

-- 
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch tobi at oetiker.ch ++41 62 775 9902 / sb: -9900