[mrtg] Using MRTG in BIG setup
Tomas Zeman
tomas.zeman at aliatel.cz
Fri Jan 28 09:35:20 MET 2005
Hello friends,
since I've not found the following design on how to use mrtg in big setup, I'd like to make the contribution, hoping it can be useful for some of you.
We had to collect&report data from ca. 150boxes/around 4000 interfaces with MRTG/rrd with a single 2xPIII(512MB) linux server, and tried several techniques.
First let me summarize pros&cons of what is available:
1. mrtg as a daemon (RunAsDaemon option): good scheme for it loads into memory once and progressively polls devices; this scheme is kind to system resources; however, if half of the network is down, timeout on polling can be so large that MRTG does not poll up-devices in 5minutes intervals and there are missing data in graphs.
2. mrtg forking (Forks option): As the reference manual says, "For situations with high latency or a great number of devices this will speed things up considerably". That's absolutely true and we used this scheme for 2 years or so. Forks option was set as high as 70. If many boxes are down, polling other boxes is not affected much, we did not experience data loses in graphs.
However, this scheme starts many processes at once, thus causing high load average, filling system cache, disk operations and so on. The situation was so painful that if you tried to log on to the server at the time mrtg polled boxes, you waited dozens of seconds for prompt. Crontabs were changed so that mrtg was started at different 5mins. intervals for groups of boxes; it helped but load average remained high.
These are system stats (taken from yearly, daily graphs):
Load avg: 3.5-4.0, during polling 7
Memory cached: 150M (very low)
PROPOSED SCHEME for big setups: MRTG & Nagios
Nagios (www.nagios.org) is a network monitoring programs which periodically runs service&hosts checks, it has many useful features etc.
The main advantage is that it schedules periodic service checks in such a way that the average system load is as small as possible. Any program which does that is suitable for the proposed setup.
The idea is to run mrtg for each box as a service check (ie. one config file per box). The result is constant load on the system, no load spikes, no heavy disk operations, no intense swapping etc.
Remark: One can argue that mrtg has to parse the config file for a particular box every time it runs. That's true but in normal setup (single/forking) it is the same. Only in "RunAsDaemon" setup the config files are parsed only once.
These are system stats with the MRTG&Nagios integration:
Load average: 0.6 !!!
Memory cached: 380M
I was really astonished at the small value of the load average for the same number of boxes/interfaces polled. Now thanks to Nagios, a mrtg process is started about every 2 secs. Since the mrtg memory usage for one box is small, higher amount of memory serves as a cache, so the perl/mrtg/libs are not read from disk but cached. Disk operations on rrd files are spread evenly across time, filesystem operations are not intensive.
I'm sure it won't be a problem to poll 15000 interfaces with the same setup; unfortunatelly I don't have so many boxes...
If you read as far as here and would like to try, the setup instructions and scripts are provided:
1. install mrtg, rrdtool as per documentation (http://people.ee.ethz.ch/~oetiker/webtools/mrtg/)
2. use cfgmaker for periodical update of configuration of boxes - one cfg file per box
3. install Nagios as per documentation (www.nagios.org), nagios-plugins are not necessary for this setup but are very useful for other thinks
4. install Dan Bernstein's daemontools (http://cr.yp.to) - I recommend to install them on every server; they make life much easier
5. mkdir /var/log/mrtg && chown mrtg:mrtg /var/log/mrtg
6. This is my filesystem organization for Nagios & mrtg:
/var/mrtg/cfg holds cfg files in <<box-name>>.cfg naming
/var/mrtg/cgi-bin: 14all.cgi for graphing
/var/mrtg/html: html files
/var/mrtg/nagios/ - nagios installation (I use separate nagios only for mrtg, ie. install nagios with ./configure --prefix=/var/mrtg --nagios-user=mrtg)
/var/mrtg/rrds/ rrd files
7. Nagios config (/var/mrtg/nagios/etc):
default setup, delete all sample hosts, hostgroups etc.
cat checkcommands.cfg:
# 'check-true' command definition
define command{
command_name check-true
command_line /bin/true
}
define command{
command_name mrtg
command_line $USER1$/mrtg $ARG1$
}
edit hosts.cfg and add:
define host{
use generic-host ; Name of host template to use
host_name mrtg
alias MRTG Nagios Server
address 127.0.0.1
check_command check-true
max_check_attempts 10
notification_interval 120
notification_period 24x7
notification_options d,u,r
}
edit services.cfg and add:
define service{
use generic-service ; Name of service template to use
name mrtg-service ; Name of service template to use
host_name mrtg
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 5
retry_check_interval 1
contact_groups mrtg-admins
notification_interval 120
notification_period 24x7
notification_options w,u,c,r
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
}
individual boxes to be polled are added to services.cfg in this way:
Example: box name box1 will have mrtg configuration in /var/mrtg/cfg/box1.cfg and its entry in /var/mrtg/nagios/etc/services.cfg will be:
define service{
use mrtg-service ; Name of service template to use
service_description MRTG - Servers
normal_check_interval 5
check_command mrtg!box1!
}
8. mrtg service check script (put to /var/mrtg/nagios/libexec):
cat /var/mrtg/nagios/libexec/mrtg
#!/bin/sh
SYS=$1
LOG_FILES=$2
SIZE=99999
if [ -z $1 ]; then
echo "Usage: $0 system [nLOG_FILES]"
exit 1
fi
if [ -z $LOG_FILES ]; then
LOG_FILES=10
fi
LOCK=/tmp/mrtg.lock.$SYS
LOG=/var/log/mrtg/$SYS
CFG=/var/mrtg/cfg/$SYS.cfg
if [ ! -d $LOG ]; then
mkdir -p $LOG
fi;
(echo "INFO: Start $SYS"; \
/command/setlock -n $LOCK mrtg $CFG; \
echo "INFO: Finished $SYS"; \
) 2>&1 | /command/multilog t s$SIZE n$LOG_FILES $LOG
if [ $? -eq 0 ]; then
echo MRTG OK
else
echo MRTG Error
exit 1
fi
9. that's all; you can check outputs from mrtg processes in /var/log/mrtg/<<box-name>>/ dirs
I would very appreciate any feedback/experience if you want to try this setup :))
Best regards,
Tomas Zeman
SysAdmin & NMS-Developer
--
Unsubscribe mailto:mrtg-request at list.ee.ethz.ch?subject=unsubscribe
Archive http://www.ee.ethz.ch/~slist/mrtg
FAQ http://faq.mrtg.org Homepage http://www.mrtg.org
WebAdmin http://www.ee.ethz.ch/~slist/lsg2.cgi
More information about the mrtg
mailing list