[mrtg] number of monitored routers is it related to MRTG stopping RRDB update?

Steve Shipway s.shipway at auckland.ac.nz
Tue Sep 1 23:59:17 CEST 2009


Error #5 is strange - this seems to indicate that something else has started a second copy of MRTG.  Are you starting MRTG using crontabs or a service controller that is incorrectly thinking MRTG has stopped and attempts to restart it, causing this confusion?  This may cause locking issues preventing updates.

Error #6 is indicating that the SNMP read of the remote device is returning 'undef' - IE it is getting no response to the SNMP query.  If this is happening a LOT (IE on all checks after things go wrong) then it would indicate a network issue - possibly a firewall or access control is blocking you via an adaptive rule? (Unlikely but possible).

FYI, we are monitoring over 15000 metrics on over 700 devices and do not experience this problem, although we are using a home-grown multi-thread scheduler.

Steve

________________________________
From: mrtg-bounces at lists.oetiker.ch [mailto:mrtg-bounces at lists.oetiker.ch] On Behalf Of Alex Koueik
Sent: Wednesday, 2 September 2009 1:09 a.m.
To: mrtg at lists.oetiker.ch
Subject: [mrtg] number of monitored routers is it related to MRTG stopping RRDB update?

Hi,

We are currently monitoring 300 routers. Is there anyone monitoring a higher number of devices with MRTG RRDB?
Our problem is that MRTG server stops updating the RRDBs while the MRTG process is still running.
We tried het below:

1-      Stopped and restarted the MRTG processes, RDDBs are updated for a period and then stopped again.

2-      Increased the child processes (Forks) from 4 to 12 processes, RDDBs updated for a period and then stopped again.

3-      We installed a carbon copy of MRTG on another machine the problem occurred on both machines at different times.

4-      We debugged snmp on some of the routers during the outage, they were not receiving any snmp requests from MRTG.

5-    We are getting from messages log: "ERROR: I Quit! Another copy of mrtg seems to be running. Check /data/mrtg-2/cfg/mrtg.pid"

6-    In the MRTG debug we are getting the following :

"ERROR: Target[route1][_IN_] ' $target->[14]{$mode} ' did not eval into defined data"


"SNMPGET Problem for 1.3.6.1.2.1.31.1.1.1.6.1 1.3.6.1.2.1.31.1.1.1.10.1 on community at xxx.xxx.xxx.xxx::::1:2:v4only<mailto:community at xxx.xxx.xxx.xxx::::1:2:v4only>
            at /data/mrtg-2/bin/mrtg line 2149

2009-09-01 00:13:47 -- --snpo: SNMPfound -- 'undef', 'undef'"




OS: Red Hat Enterprise Linux ES release 4 (Nahant) 3.4.3-9.EL4
MRTG: mrtg-2.15.1
RRDtool: rrdtool-1.2.19

Any pointers on where the problem is, is highly appreciated.

Best regards,


Alex Koueik
NavLink

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.oetiker.ch/pipermail/mrtg/attachments/20090902/bf3b7276/attachment.htm 


More information about the mrtg mailing list