[smokeping-users] Slave gaps in all charts during outage
josh at imaginenetworksllc.com
Fri Sep 18 01:13:18 CEST 2009
To rule out DNS - are the boxes using a DNS cache server on themselves or
using a secondary server? What's the TTL on those A/CNAME records and how
long was your outage?
1100 Wayne St
Troy, OH 45373
"When you have eliminated the impossible, that which remains, however
improbable, must be the truth."
--- Sir Arthur Conan Doyle
On Thu, Sep 17, 2009 at 7:02 PM, David Rees <drees76 at gmail.com> wrote:
> We use smokeping to monitor a number of hosts on various networks. We
> have a master with a handful of slaves which monitor various sites.
> This morning we had an outage which affected one of those sites, but
> the slaves which were monitoring the site that went down, failed to
> report any data at all for any networks - even if they were reachable
> from that network. Communications between the master/slaves were not
> The affected slaves were reporting this message:
> WARNING Master said 500 read timeout
> While the master had messages like:
> RRDs::update ERROR: /var/lib/smokeping/rrd/slave/slave~site1.rrd:
> illegal attempt to update using time 1253201797 when last update time
> is 1253201797 (minimum one second step)
> All machines are running smokeping 2.4.2. Any ideas?
> The only thing I can think of is that DNS for the site that went down
> was also down so the master timed out trying to look it up the site's
> IP address?
> smokeping-users mailing list
> smokeping-users at lists.oetiker.ch
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the smokeping-users