[smokeping-users] Slave gaps in all charts during outage

Josh Luthman josh at imaginenetworksllc.com
Fri Sep 18 01:13:18 CEST 2009


To rule out DNS - are the boxes using a DNS cache server on themselves or
using a secondary server?  What's the TTL on those A/CNAME records and how
long was your outage?

Josh Luthman
Office: 937-552-2340
Direct: 937-552-2343
1100 Wayne St
Suite 1337
Troy, OH 45373

"When you have eliminated the impossible, that which remains, however
improbable, must be the truth."
--- Sir Arthur Conan Doyle


On Thu, Sep 17, 2009 at 7:02 PM, David Rees <drees76 at gmail.com> wrote:

> Hi,
>
> We use smokeping to monitor a number of hosts on various networks.  We
> have a master with a handful of slaves which monitor various sites.
>
> This morning we had an outage which affected one of those sites, but
> the slaves which were monitoring the site that went down, failed to
> report any data at all for any networks - even if they were reachable
> from that network.  Communications between the master/slaves were not
> affected.
>
> The affected slaves were reporting this message:
>
> WARNING Master said 500 read timeout
>
> While the master had messages like:
>
> RRDs::update ERROR: /var/lib/smokeping/rrd/slave/slave~site1.rrd:
> illegal attempt to update using time 1253201797 when last update time
> is 1253201797 (minimum one second step)
>
> All machines are running smokeping 2.4.2.  Any ideas?
>
> The only thing I can think of is that DNS for the site that went down
> was also down so the master timed out trying to look it up the site's
> IP address?
>
> Thanks
>
> Dave
>
> _______________________________________________
> smokeping-users mailing list
> smokeping-users at lists.oetiker.ch
> https://lists.oetiker.ch/cgi-bin/listinfo/smokeping-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.oetiker.ch/pipermail/smokeping-users/attachments/20090917/ca998cc0/attachment.htm 


More information about the smokeping-users mailing list