[mrtg] Re: MRTG High Availability

Greg.Volk at edwardjones.com Greg.Volk at edwardjones.com
Tue Apr 23 16:51:18 MEST 2002

> I've been working with MRTG for some time now, and we've come to rely on 
> the output it produces.  I've now been asked to move MRTG from one 
> server onto two for high-availability reasons.
> Basically, I'm wondering if anyone has developed any sort of setup 
> involving failover.  i.e.: if one server running MRTG fails, another one 
> takes up the slack.  If anyone has a suggestion, I'm all ears.

I've put some idle (toilet) thought into this as I suspect
I'll be asked to do the same as soon as my current platform
incurs a noticable outage. 

"What do you mean there's no redundancy for this system??"
"Ummmm...well, I kind of deployed it with old spare hardware
and very little free time, so it never was exactly an 'approved,
official system' thus it wasn't identified as 'mission 

>From a simple-failover point of view, deploying mrtg on two 
seperate boxes and having them both be data collectors (mrtg) 
and data publishers (web servers), the goal of fault tolerance 
is well within reach. The addition of a stateful load balancer 
(cisco local-director, radware WSD, etc) would probably
complete the redundancy package quite nicely.

There are at least two problems with the above statement:

One machine crashes. You get it back on line 30 minutes
later. What do you do about syncing all the RRDs to the
machine that stayed up so there are no gaps in the failed
server's data? While I'm sitting here, the only thing that
comes to mind is to copy the RRDs within <poll interval> 
time. This may be a problem if you're dealing with thousands 
upon thousands of RRDs that will take longer than <poll 
interval> time to replicate. Also, if the two servers are 
seperated by a relatively slow WAN link (for geographic
redundancy) copying that many small files in less than <poll
interval> minutes will only exacerbate the time problem. 

The other caveat that comes to mind is that with a redundant
polling server you multiply _all_ of your mrtg-related snmp 
traffic by two. This is only an issue where you might be 
dealing with small or congested wide area links.


Unsubscribe mailto:mrtg-request at list.ee.ethz.ch?subject=unsubscribe
Archive     http://www.ee.ethz.ch/~slist/mrtg
FAQ         http://faq.mrtg.org    Homepage     http://www.mrtg.org
WebAdmin    http://www.ee.ethz.ch/~slist/lsg2.cgi

More information about the mrtg mailing list