[smokeping-users] Re: Alerts cause Smokeping to stop working
Craig Dibble
craig at rootdev.com
Tue May 2 09:10:42 MEST 2006
Hi Niko, thanks for your prompt reply. Responses inline.
Niko Tyni wrote:
> Hi,
>
> some clarifications:
>
> - Do you have any alerts enabled in the Targets section?
Yes, I cut the Target section for reasons of brevity, but the alerts are
set up in the following fashion on both servers:
*** Targets ***
probe = FPing
menu = Top
title = Network Latency Grapher
remark = Welcome to SmokePing
+ Server B
menu = Server B
title = Server B
++ core
menu = Core
title = Core
alerts = bigloss,someloss,startloss
+++ router1
menu = router1
title = router1
rawlog=%Y-%m-%d
host = <IP Address>
and so on...
> - Is the above quote from server A or server B? If from A, please include
> it from server B too. (Server A is not interesting here; it's working
> 'well enough' and is an ancient version.)
It was Server B, but as I pointed out, the only difference in the base
config was the addition of the concurrentprobes line on Server B.
> - When server B stops logging, does the smokeping daemon die or is it
> just doing nothing? Does it recover when the unresponsible devices
> come back?
It's still running, but doing nothing, as soon as the unresponsive
device recovers it starts logging data again.
> There are two problems here: the parameters should be tuned so that
> you never get the 'smokeping took ... seconds' message, even when
> the targets are down, but obviously Smokeping should recover from it.
I thought that too. Our step times are only 60 seconds on both systems,
so with 127 targets on Server A it's probably no surprise we get a lot
of log messages to this extent, but the time to check the 11 targets on
server B when one was not responding was sitting fairly consistently at
130 seconds, which seems unfeasibly long.
> I don't quite understand why the messages show up in the first place
> with the FPing parameters you have, but I'll look into that.
>
> The best help would be the output of 'smokeping --debug-daemon' at
> outage time, but of course I realize that might be a bit hard to get
> given the verboseness of it even when everything is OK.
>
> The output of 'smokeping --debug' when everything works would be good
> to have too.
Ok, catching it when it's failing will indeed be difficult but if the
opportunity presents itself I will try.
Do you mind if I send you the debug output offlist as it's large and
would need to be fairly extensively edited for confidentiality reasons.
Thanks again,
Craig
--
Unsubscribe mailto:smokeping-users-request at list.ee.ethz.ch?subject=unsubscribe
Help mailto:smokeping-users-request at list.ee.ethz.ch?subject=help
Archive http://lists.ee.ethz.ch/smokeping-users
WebAdmin http://lists.ee.ethz.ch/lsg2.cgi
More information about the smokeping-users
mailing list