[smokeping-users] Alerts cause Smokeping to stop working
Craig Dibble
craig at rootdev.com
Tue May 2 03:20:59 MEST 2006
Hi all,
I've got two servers running Smokeping - one (Server A) monitoring 127
hosts with fping, the other (Server B) in another city monitoring just
11 hosts.
When one device stops responding on Server B it stops logging data for
all the hosts it is monitoring, but when the same thing happens on
Server A it steps over the failures and carries on.
During outages the logs on both servers are filled with messages like this:
May 1 10:55:20 mon01 smokeping[9540]: FPing: WARNING: smokeping took
130 seconds to complete 1 round of polling. It should complete polling
in 60 seconds. You may have unresponsive devices in your setup.
The strange thing is, the configs are identical, apart from the Target
definitions, and the fact that Server B has:
concurrentprobes = yes
set (although we are only using one probe so I'm not sure this is
relevant, unless I misunderstood).
Server A is running version 1.4, compared to 2.0.4 on Server B.
I have seen a few mentions of a similar problem in the list archives,
but I haven't found a satisfactory answer. I know I should probably
upgrade Server A to a newer version, but obviously am reluctant to do so
when it works but the newer version doesn't.
My suspicion is that it is something to do with differences in FPing.pm
between the two versions (fping binary version is the same on both
boxes: 2.4b2), and that perhaps I need to edit it or set a smaller
timeout, but any advice would be gratefully received.
Many thanks,
Craig
Here is the config for Server B, up to the Target definitions
*** General ***
owner = <owner>
contact = <owner at hostname>
mailhost = <mailhost>
sendmail = /usr/sbin/sendmail
imgcache = /usr/local/smokeping/webdocs/images
imgurl = /images
datadir = /usr/local/smokeping/var
piddir = /usr/local/smokeping/var
cgiurl = http://<hostname>/cgi-bin/smokeping.cgi
smokemail = /usr/local/smokeping/etc/smokemail
# specify this to get syslog logging
syslogfacility = local0
concurrentprobes = yes # Not set on Server A
*** Alerts ***
to = <alert at localhost>
from = smokealert@<hostname>
+bigloss
type = loss
# in percent
pattern = ==0%,==0%,==0%,==0%,>0%,>0%,>0%
comment = suddenly there is packet loss
+someloss
type = loss
# in percent
pattern = >0%,*3*,>0%,*3*,>0%
comment = loss 3 times in a row
+startloss
type = loss
# in percent
pattern = ==S,>0%,>0%,>0%
comment = loss at startup
+rttdetect
type = rtt
# in milli seconds
pattern = <10,<10,<10,<10,<10,<100,>100,>100,>100
comment = routing mesed up again ?
*** Database ***
step = 60
pings = 10
# consfn mrhb steps total
AVERAGE 0.5 1 1008
AVERAGE 0.5 12 4320
MIN 0.5 12 4320
MAX 0.5 12 4320
AVERAGE 0.5 144 720
MAX 0.5 144 720
MIN 0.5 144 720
*** Presentation ***
template = /usr/local/smokeping/etc/basepage.html
+ overview
width = 600
height = 50
range = 10h
+ detail
width = 600
height = 200
unison_tolerance = 2
"Last 3 Hours" 3h
"Last 30 Hours" 30h
"Last 10 Days" 10d
"Last 400 Days" 400d
*** Probes ***
+ FPing
binary = /usr/sbin/fping
*** Targets ***
probe = FPing
menu = Top
title = Network Latency Grapher
<Target definitions...>
--
Unsubscribe mailto:smokeping-users-request at list.ee.ethz.ch?subject=unsubscribe
Help mailto:smokeping-users-request at list.ee.ethz.ch?subject=help
Archive http://lists.ee.ethz.ch/smokeping-users
WebAdmin http://lists.ee.ethz.ch/lsg2.cgi
More information about the smokeping-users
mailing list