[smokeping-users] Slave gaps in all charts during outage
David Rees
drees76 at gmail.com
Fri Sep 18 01:02:17 CEST 2009
Hi,
We use smokeping to monitor a number of hosts on various networks. We
have a master with a handful of slaves which monitor various sites.
This morning we had an outage which affected one of those sites, but
the slaves which were monitoring the site that went down, failed to
report any data at all for any networks - even if they were reachable
from that network. Communications between the master/slaves were not
affected.
The affected slaves were reporting this message:
WARNING Master said 500 read timeout
While the master had messages like:
RRDs::update ERROR: /var/lib/smokeping/rrd/slave/slave~site1.rrd:
illegal attempt to update using time 1253201797 when last update time
is 1253201797 (minimum one second step)
All machines are running smokeping 2.4.2. Any ideas?
The only thing I can think of is that DNS for the site that went down
was also down so the master timed out trying to look it up the site's
IP address?
Thanks
Dave
More information about the smokeping-users
mailing list