[smokeping-users] Alerts cause Smokeping to stop working

Craig Dibble craig at rootdev.com
Tue May 2 03:20:59 MEST 2006


Hi all,

I've got two servers running Smokeping - one (Server A) monitoring 127 
hosts with fping, the other (Server B) in another city monitoring just 
11 hosts.

When one device stops responding on Server B it stops logging data for 
all the hosts it is monitoring, but when the same thing happens on 
Server A it steps over the failures and carries on.

During outages the logs on both servers are filled with messages like this:

May  1 10:55:20 mon01 smokeping[9540]: FPing: WARNING: smokeping took 
130 seconds to complete 1 round of polling. It should complete polling 
in 60 seconds. You may have unresponsive devices in your setup.

The strange thing is, the configs are identical, apart from the Target 
definitions, and the fact that Server B has:

concurrentprobes = yes

set (although we are only using one probe so I'm not sure this is 
relevant, unless I misunderstood).

Server A is running version 1.4, compared to 2.0.4 on Server B.

I have seen a few mentions of a similar problem in the list archives, 
but I haven't found a satisfactory answer. I know I should probably 
upgrade Server A to a newer version, but obviously am reluctant to do so 
when it works but the newer version doesn't.

My suspicion is that it is something to do with differences in FPing.pm 
between the two versions (fping binary version is the same on both 
boxes: 2.4b2), and that perhaps I need to edit it or set a smaller 
timeout, but any advice would be gratefully received.

Many thanks,
Craig

Here is the config for Server B, up to the Target definitions

*** General ***
owner    = <owner>
contact  = <owner at hostname>
mailhost = <mailhost>
sendmail = /usr/sbin/sendmail
imgcache = /usr/local/smokeping/webdocs/images
imgurl   = /images
datadir  = /usr/local/smokeping/var
piddir  = /usr/local/smokeping/var
cgiurl   = http://<hostname>/cgi-bin/smokeping.cgi
smokemail = /usr/local/smokeping/etc/smokemail
# specify this to get syslog logging
syslogfacility = local0
concurrentprobes = yes 	 # Not set on Server A

*** Alerts ***
to = <alert at localhost>
from = smokealert@<hostname>

+bigloss
type = loss
# in percent
pattern = ==0%,==0%,==0%,==0%,>0%,>0%,>0%
comment = suddenly there is packet loss

+someloss
type = loss
# in percent
pattern = >0%,*3*,>0%,*3*,>0%
comment = loss 3 times  in a row

+startloss
type = loss
# in percent
pattern = ==S,>0%,>0%,>0%
comment = loss at startup

+rttdetect
type = rtt
# in milli seconds
pattern = <10,<10,<10,<10,<10,<100,>100,>100,>100
comment = routing mesed up again ?

*** Database ***

step     = 60
pings    = 10

# consfn mrhb steps total

AVERAGE  0.5   1  1008
AVERAGE  0.5  12  4320
     MIN  0.5  12  4320
     MAX  0.5  12  4320
AVERAGE  0.5 144   720
     MAX  0.5 144   720
     MIN  0.5 144   720

*** Presentation ***

template = /usr/local/smokeping/etc/basepage.html

+ overview

width = 600
height = 50
range = 10h

+ detail

width = 600
height = 200
unison_tolerance = 2

"Last 3 Hours"    3h
"Last 30 Hours"   30h
"Last 10 Days"    10d
"Last 400 Days"   400d

*** Probes ***

+ FPing

binary = /usr/sbin/fping

*** Targets ***

probe = FPing

menu = Top
title = Network Latency Grapher

<Target definitions...>

--
Unsubscribe mailto:smokeping-users-request at list.ee.ethz.ch?subject=unsubscribe
Help        mailto:smokeping-users-request at list.ee.ethz.ch?subject=help
Archive     http://lists.ee.ethz.ch/smokeping-users
WebAdmin    http://lists.ee.ethz.ch/lsg2.cgi



More information about the smokeping-users mailing list