[smokeping-users] Slave gaps in all charts during outage
josh at imaginenetworksllc.com
Fri Sep 18 01:36:16 CEST 2009
Well if communication between the two servers was just fine on layer 3 but
it couldn't resolve, layer 7, your problem there was that the slave didn't
know what IP the master was.
You could up the TTL to 4 hours and it could have worked in that last
scenario, or 8 hours, etc.
For DNS on something like this I suggest you keep a long record, we'll say a
week. If you know you're going to change it, change the TTL for half an
hour or a full hour a week in advance of the change. Then change it to the
new IP and put the TTL back to a week.
1100 Wayne St
Troy, OH 45373
"When you have eliminated the impossible, that which remains, however
improbable, must be the truth."
--- Sir Arthur Conan Doyle
On Thu, Sep 17, 2009 at 7:29 PM, David Rees <drees76 at gmail.com> wrote:
> On Thu, Sep 17, 2009 at 4:13 PM, Josh Luthman
> <josh at imaginenetworksllc.com> wrote:
> > To rule out DNS - are the boxes using a DNS cache server on themselves or
> > using a secondary server? What's the TTL on those A/CNAME records and
> > long was your outage?
> All the boxes use a caching DNS server - the TTL on the host that went
> down that the affected slaves were monitoring was 5 minutes - it was
> down for close to 3 hours.
> I've since changed my config to use IP addresses for the host config,
> but it'd be nice to not have to and for the slaves to cache the last
> lookup in case there is a DNS failure...
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the smokeping-users