[smokeping-users] Re: Alerts cause Smokeping to stop working

Craig Dibble craig at rootdev.com
Tue May 2 09:10:42 MEST 2006


Hi Niko, thanks for your prompt reply. Responses inline.


Niko Tyni wrote:

> Hi,
> 
> some clarifications: 
> 
> - Do you have any alerts enabled in the Targets section?

Yes, I cut the Target section for reasons of brevity, but the alerts are 
set up in the following fashion on both servers:

*** Targets ***

probe = FPing

menu = Top
title = Network Latency Grapher
remark = Welcome to SmokePing

+ Server B
menu = Server B
title = Server B

++ core
menu = Core
title = Core
alerts = bigloss,someloss,startloss

+++ router1
menu = router1
title = router1
rawlog=%Y-%m-%d
host = <IP Address>

and so on...

> - Is the above quote from server A or server B? If from A, please include
>   it from server B too. (Server A is not interesting here; it's working
>   'well enough' and is an ancient version.)

It was Server B, but as I pointed out, the only difference in the base 
config was the addition of the concurrentprobes line on Server B.

> - When server B stops logging, does the smokeping daemon die or is it
>   just doing nothing? Does it recover when the unresponsible devices
>   come back?

It's still running, but doing nothing, as soon as the unresponsive 
device recovers it starts logging data again.

> There are two problems here: the parameters should be tuned so that
> you never get the 'smokeping took ... seconds' message, even when
> the targets are down, but obviously Smokeping should recover from it.

I thought that too. Our step times are only 60 seconds on both systems, 
so with 127 targets on Server A it's probably no surprise we get a lot 
of log messages to this extent, but the time to check the 11 targets on 
server B when one was not responding was sitting fairly consistently at 
130 seconds, which seems unfeasibly long.

> I don't quite understand why the messages show up in the first place
> with the FPing parameters you have, but I'll look into that.
> 
> The best help would be the output of 'smokeping --debug-daemon' at 
> outage time, but of course I realize that might be a bit hard to get
> given the verboseness of it even when everything is OK.
> 
> The output of 'smokeping --debug' when everything works would be good 
> to have too.

Ok, catching it when it's failing will indeed be difficult but if the 
opportunity presents itself I will try.

Do you mind if I send you the debug output offlist as it's large and 
would need to be fairly extensively edited for confidentiality reasons.

Thanks again,
Craig

--
Unsubscribe mailto:smokeping-users-request at list.ee.ethz.ch?subject=unsubscribe
Help        mailto:smokeping-users-request at list.ee.ethz.ch?subject=help
Archive     http://lists.ee.ethz.ch/smokeping-users
WebAdmin    http://lists.ee.ethz.ch/lsg2.cgi



More information about the smokeping-users mailing list