[mrtg] Bug in 2.9.29???

Peter Glanville peter_glanville at cuk.canon.co.uk
Fri Jun 20 12:04:56 MEST 2003


Hi all,
I have just upgraded from 2.9.22 to 2.9.29, and have started experiencing a
new problem, which causes MRTG to fail and stop running.
This has now fallen over two nights running (I upgraded on monday)

I am running on w2kPro workstation (sp3)
My config 'includes' a series of other files, with RunAsDemon set to 5
mins.
It would APPEAR that when a target fails, MRTG tries to skip other targets
at that address. But does not do so properly.

The screen display (inserted at the end of this email) shows one Target
failing (172.16.40.3). That particular device was down last night. I
include this as it MIGHT be relevant.
It then moved on through the config (9 other switches, with about 280
ports) and then tried to read a number of ports from 172.16.32.154 (3Com
switch3000).
It has an SNMPGet fail on port 107 (being written to wdh3000_15_407), and
warns that it has no data for that port. (should have worked, no idea why
it failed)
It then warns that it will skip other targets with that IP address.
In the config, the next target would be port 108 (written to wdh3000_15_408
), but it should therefore have been skipped.

Having done the SNMP errors and Warnings, the display then proceeds to list
the Errors encountered in that round of the MRTG poll.
It lists the down device, and then the failed port (as expected)
But it then errors on the next port (108) with '1056050365:0:0' being the
same time as already in the file.
Then the whole MRTG process stops, and the Batch file ends. And nothing
else gets monitored overnight.

1056050365 = 19 June 2003 19:19:25 GMT
The file being written to (wdh3000_15_408.rrd) is date stamped 19 June 2003
20:24 BST (GMT + 1)
That port has not had anything attached since rebooting the switch, but I
simply monitor the whole switch rather than forgetting to add ports to MRTG
when reconnecting servers etc. So zero values could either be a reading, or
lack of reading.

So it looks like MRTG has failed on a target, decided to skip the rest of
that Host, but has already opened the file for the next target, realised
that it needs to skip it, and still tried to write duff data. It is
attempting to update the file at 20:24, but using the time value for 20:19.
Not surprisingly, this is the time value already in the file, and the thing
dies.

Upgraded MRTG version on Monday. Failures on Wednesday night (Thursday
morning, 1am) and Thursday night 8:24pm.
Possibly, overnight backup traffic has caused the snmp traffic to get lost,
hence one target fails.


Any suggestions on debugging?
I am happy to try things, but being a winDoze user have limited access to
the code.

I have already thought of some work arounds, but can anyone suggest a cure?

Or can anyone suggest where my interpretation has gone wrong?


The screen showing the error reads thus:

SNMP Error:
no response received
SNMPv1_Session (remote host: "172.16.40.3" [172.16.40.3].161)
                  community: "public"
                 request ID: 811520969
                PDU bufsize: 8000 bytes
                    timeout: 2s
                    retries: 5
                    backoff: 1)
SNMPGET Problem for ifHCInOctets.1 ifHCInOctets.2 on public at 172.16.40.3
WARNING: skipping because at least the query for ifHCInOctets.1 on
172.16.40.3 did not succeed
WARNING: no data for ifHCInOctets&ifHCInOctets:public at 172.16.40.3. Skipping
further queries for Host 172.16.40.3 in this round.
SNMP Error:
no response received
SNMPv1_Session (remote host: "172.16.32.154" [172.16.32.154].161)
                  community: "public"
                 request ID: 1417555437
                PDU bufsize: 8000 bytes
                    timeout: 2s
                    retries: 5
                    backoff: 1)
SNMPGET Problem for ifInOctets.107 ifOutOctets.107 on public at 172.16.32.154
WARNING: skipping because at least the query for ifInOctets.107 on
172.16.32.154 did not succeed
WARNING: no data for ifInOctets&ifOutOctets:public at 172.16.32.154. Skipping
further queries for Host 172.16.32.154 in this round.
ERROR: Target[wall3500_1][_IN_] ' $$target[2]{$mode} ' did not eval into
defined data
ERROR: Target[wall3500_1][_OUT_] ' $$target[2]{$mode} ' did not eval into
defined data
ERROR: Target[wdh3000_15_407][_IN_] ' $$target[285]{$mode} ' did not eval
into defined data
ERROR: Target[wdh3000_15_407][_OUT_] ' $$target[285]{$mode} ' did not eval
into defined data
ERROR: Cannot update /mrtgdata\wdh3000_15_408.rrd with '1056050365:0:0'
illegal attempt to update using time 1056050365 when last update time is
1056050365 (minimum one second step)

Press any key to continue . . .


Regards
Peter Glanville
Network Analyst, ICtS, Canon (UK) Ltd



"This email may contain confidential information which is intended for the
required recipient only.  If you are not the named recipient you should not
take any action in relation to this email, other than to notify us that you
have received it in error.  If this email contains attachments you should
ensure they are checked for viruses before opening them."

--
Unsubscribe mailto:mrtg-request at list.ee.ethz.ch?subject=unsubscribe
Archive     http://www.ee.ethz.ch/~slist/mrtg
FAQ         http://faq.mrtg.org    Homepage     http://www.mrtg.org
WebAdmin    http://www.ee.ethz.ch/~slist/lsg2.cgi



More information about the mrtg mailing list