[smokeping-users] Slave gaps in all charts during outage

David Rees drees76 at gmail.com
Fri Sep 18 01:02:17 CEST 2009


We use smokeping to monitor a number of hosts on various networks.  We
have a master with a handful of slaves which monitor various sites.

This morning we had an outage which affected one of those sites, but
the slaves which were monitoring the site that went down, failed to
report any data at all for any networks - even if they were reachable
from that network.  Communications between the master/slaves were not

The affected slaves were reporting this message:

WARNING Master said 500 read timeout

While the master had messages like:

RRDs::update ERROR: /var/lib/smokeping/rrd/slave/slave~site1.rrd:
illegal attempt to update using time 1253201797 when last update time
is 1253201797 (minimum one second step)

All machines are running smokeping 2.4.2.  Any ideas?

The only thing I can think of is that DNS for the site that went down
was also down so the master timed out trying to look it up the site's
IP address?



