[mrtg] graph spike problem - stumped.

Tue Sep 3 18:00:00 MEST 2002

Hello,

I recently wrote in about the proper syntax and potential problems with monitoring bandwidth on catalyst switches by blade.

While this is working fine for many of the blades on many of the switches we have, there are a few blades which seem to be generating faulty results. Graphs generated for these blades show up as having very large (dozens or even hundreds of megabit swings) spikes, which end up scaling the 'real' results such that they cannot really be seen.

I am monitoring each blade by using a separately crafted config file with entries for each blade as the sum of all its ports:

Target[blade]: port1:community at switch:::::2 + port2:community at switch:::::2 + ... + port48:community at switch:::::2

I am using snmpv2c polling - I don't think it is a counter cycling issue. Also, I have set my MaxBytes parameter to be equal to (number of ports per blade)*(100mbit) - for the most part this is either 300000000 or 600000000. Most of these ports are actually 10mbit, but this changes occasionally and I do not always know for sure, so I just assume they are all (capable of) 100mbit in the config file. 

I am using RRDTool. I have read about problems where lack of communication to the devices cause these spikes, so I looked at a dump of the .rrd for one of the blades I am having a problem with. The output does not contain any -1, 0 or NaN values that might explain the swings. However, in the log files I occasionally have some errors about a particular port's query not succeeding and then, for the blade spec I get something like:

ERROR: Target[whatever][_IN_] ' $$target[####]{$mode} + ... + $$target[####]{$mode} ' warn: use of uninitialized value at (eval 197) line 1

These errors are not logged for all of the blades in question.

What else should I be looking for? Is this just an issue with a couple of ports timing out causing the whole thing to mess up? I am monitoring several thousand ports across ~150 blades, on a dual 1.4GHz P3 machine with 512MB ram, a somewhat slow IDE disk and a full duplex 100mbit ethernet connection - is the machine a problem? Processor and bandwidth utilization on the machine itself (as monitored by MRTG!) do not seem to suggest a problem there.

Is there a way I can make the 'uninitialized value' take the previous value so I can get rid of spikes? This would seem difficult since the blade data-gathering does not store per-port info also. Basically the per-blade graphing is useless to me for the blades that exhibit this problem. 

-Will

____________________________________
Will Saxon
Systems Programmer
Division of Housing Network Services
University of Florida
Phone: (352) 392-2161
Fax: (352) 392-6819
Email: WillS at housing.ufl.edu 

--
Unsubscribe mailto:mrtg-request at list.ee.ethz.ch?subject=unsubscribe
Archive     http://www.ee.ethz.ch/~slist/mrtg
FAQ         http://faq.mrtg.org    Homepage     http://www.mrtg.org
WebAdmin    http://www.ee.ethz.ch/~slist/lsg2.cgi