[smokeping-users] Slave gaps in all charts during outage

Fri Sep 18 01:36:16 CEST 2009

Well if communication between the two servers was just fine on layer 3 but
it couldn't resolve, layer 7, your problem there was that the slave didn't
know what IP the master was.

You could up the TTL to 4 hours and it could have worked in that last
scenario, or 8 hours, etc.

For DNS on something like this I suggest you keep a long record, we'll say a
week.  If you know you're going to change it, change the TTL for half an
hour or a full hour a week in advance of the change.  Then change it to the
new IP and put the TTL back to a week.

Josh Luthman
Office: 937-552-2340
Direct: 937-552-2343
1100 Wayne St
Suite 1337
Troy, OH 45373

"When you have eliminated the impossible, that which remains, however
improbable, must be the truth."
--- Sir Arthur Conan Doyle

On Thu, Sep 17, 2009 at 7:29 PM, David Rees <drees76 at gmail.com> wrote:

> On Thu, Sep 17, 2009 at 4:13 PM, Josh Luthman
> <josh at imaginenetworksllc.com> wrote:
> > To rule out DNS - are the boxes using a DNS cache server on themselves or
> > using a secondary server?  What's the TTL on those A/CNAME records and
> how
> > long was your outage?
>
> All the boxes use a caching DNS server - the TTL on the host that went
> down that the affected slaves were monitoring was 5 minutes - it was
> down for close to 3 hours.
>
> I've since changed my config to use IP addresses for the host config,
> but it'd be nice to not have to and for the slaves to cache the last
> lookup in case there is a DNS failure...
>
> -Dave
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.oetiker.ch/pipermail/smokeping-users/attachments/20090917/309bc187/attachment.htm