[mrtg] Re: MRTG High Availability

Tue Apr 23 16:52:04 MEST 2002

If it need to be High-Availability,  you could put the RRDs, configs, htmls,
on NFS mounted storage devices that are mirrored.  This way you should never
lose your data. Now the failover from one MRTG Server to the other, thats
the tricky part,  what your going to have to consider is what do you define
as MRTG failing, then write some type of program or script that you would
run on the backup MRTG server every #minutes or seconds.  If its tests come
back TRUE , fire up MRTG on this box.  Since all the data files
(configs,RRD, and htmls) are NFSed from another server or a Net storage
device, they really dont care which server updates them.

Management shouldn't have a problem spending the money to make if redunant
if they see a need for this to be a High-Availability system.

Just my .02
Trent

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-==-=-=-=-=-=-=-=-=-=
    Trent Melcher  (tmelcher at trilogytel.com)
            StarTouch International
         Network/System Administrator
            Phone:402.346.4600x103
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-==-=-=-=-=-=-=-=-=-=

-----Original Message-----
From: mrtg-bounce at list.ee.ethz.ch [mailto:mrtg-bounce at list.ee.ethz.ch]On
Behalf Of Greg.Volk at edwardjones.com
Sent: Tuesday, April 23, 2002 9:51 AM
To: mrtg at list.ee.ethz.ch; jsignalness at btinet.net
Subject: [mrtg] Re: MRTG High Availability

>
> I've been working with MRTG for some time now, and we've come to rely on
> the output it produces.  I've now been asked to move MRTG from one
> server onto two for high-availability reasons.
>
> Basically, I'm wondering if anyone has developed any sort of setup
> involving failover.  i.e.: if one server running MRTG fails, another one
> takes up the slack.  If anyone has a suggestion, I'm all ears.
>

I've put some idle (toilet) thought into this as I suspect
I'll be asked to do the same as soon as my current platform
incurs a noticable outage.

Boss:
"What do you mean there's no redundancy for this system??"
Me:
"Ummmm...well, I kind of deployed it with old spare hardware
and very little free time, so it never was exactly an 'approved,
official system' thus it wasn't identified as 'mission
critical'"

>From a simple-failover point of view, deploying mrtg on two
seperate boxes and having them both be data collectors (mrtg)
and data publishers (web servers), the goal of fault tolerance
is well within reach. The addition of a stateful load balancer
(cisco local-director, radware WSD, etc) would probably
complete the redundancy package quite nicely.

There are at least two problems with the above statement:

One machine crashes. You get it back on line 30 minutes
later. What do you do about syncing all the RRDs to the
machine that stayed up so there are no gaps in the failed
server's data? While I'm sitting here, the only thing that
comes to mind is to copy the RRDs within <poll interval>
time. This may be a problem if you're dealing with thousands
upon thousands of RRDs that will take longer than <poll
interval> time to replicate. Also, if the two servers are
seperated by a relatively slow WAN link (for geographic
redundancy) copying that many small files in less than <poll
interval> minutes will only exacerbate the time problem.

The other caveat that comes to mind is that with a redundant
polling server you multiply _all_ of your mrtg-related snmp
traffic by two. This is only an issue where you might be
dealing with small or congested wide area links.

--
Unsubscribe mailto:mrtg-request at list.ee.ethz.ch?subject=unsubscribe
Archive     http://www.ee.ethz.ch/~slist/mrtg
FAQ         http://faq.mrtg.org    Homepage     http://www.mrtg.org
WebAdmin    http://www.ee.ethz.ch/~slist/lsg2.cgi

--
Unsubscribe mailto:mrtg-request at list.ee.ethz.ch?subject=unsubscribe
Archive     http://www.ee.ethz.ch/~slist/mrtg
FAQ         http://faq.mrtg.org    Homepage     http://www.mrtg.org
WebAdmin    http://www.ee.ethz.ch/~slist/lsg2.cgi