[rrd-users] Oddly excessive bandwidth reports coming from a T1, read on for details

Mon Apr 22 11:23:14 MEST 2002

Hello
  I'll start with the problem.  I'm using RRDtool to monitor 2T1 lines on 
two different networks, and it consistently reports average aggregate bandwidth
values exceeding 1.54 Mbps, which is supposed to be impossible.

  (by the way, if this message is inappropriate for this list, please point me
somewhere; this issue is rather frustrating.)

  Here are some more details.  One on place I have a router with two T1s, and
I collect data in a sloppy way (note I don't administer these routers so I don't
have much say in how I get to collect data).  Basically, I have an expect script
run from cron, which, every 5 minutes, telnets into the router and gets its 
5 minute throughput averages for s1 and s2, and puts them on some graphs.  It 
constantly passes 1.54 .. even more oddly, it seems that output peaks at 1.54, 
whereas input and go uninhibited (albeit below 1.54) during this time frame.

example images:

old one:
http://ultrasoul.com/~matusa/graphs2.png

slightly newer; this one uses gprint to show averages:
http://ultrasoul.com/~matusa/damnit.png

now: I have ntop listening to this network on a hub (not a switch that calls 
itself a hub; a real, stupid hub).  It reports similar values (this is a
network with minimum local traffic). I have also run tests, and the 
throughput I generated was reflected accurately on the graphs.

  Next. I have another setup which is much more cleanly done.  There are
two routers each with a T1 (again, I didn't set that up), and I talk to
them through SNMP.  I collect ifInOctets and ifOutOctets from both of them,
and do all necessary computation.  In this case I use a COUNTER data source
type, whereas the last was a GAUGE.  I guess that is obvious.  Anyway, that
network isn't as overloaded, but it gets the same oddities.  I check the 
counter every minute, so there is no way that a weird abnormal burst could 
mess this up; if it spikes past 1.54 aggregate, then that is an average 
for one minute.

  Ntop listens on this network too.  it matches my data.  Also, someone 
fired up some crap windows program (I'm doing all this on linux, of course)
that talks SNMP and does graphs real-time: It ran for only a moment, so it
didn't catch values past 1.54, but while it was running, my data matched it.

  Just in case any one is wondering, the difference is not slight, as the
graphs indicate--sometimes we hit 2.5 on one T1.  Also, the data I collect
from the routers is from the serial device of course, not the eth device 
(just in case someone thought I made a dumb error).  oh yeah, i telnetted
into the routers a few times and added up the 5 minute average throughput
values by hand.  They were past 1.54 (by a lot.. 1.87 for instance).

  If anyone can explain what the hell is going on, and why these T1s are
retartedly fat (not like I'm complaining), I would be much obliged.

Oh yeah. Tobi, if you read this far, RRDtool is a tool of great value to
me; I use it to monitor many aspects of my linux boxes (collected via
/proc).  You RULE!!!!

-mateusz-

By the way, I wrote this email a week ago, and since I wasn't on the list,
it bounced.  I'm on the list now, a week later, and the issue is about
the same.  I haven't dealt with it much, having had other things to do.

oh.. one new development, MRTG agrees with me on another network (though
I haven't seen if it passes the 1.54 boundary when mine does, though
you'd expect it to)

tobi, once again, thank you for making this excellent piece of software

--
Unsubscribe mailto:rrd-users-request at list.ee.ethz.ch?subject=unsubscribe
Help        mailto:rrd-users-request at list.ee.ethz.ch?subject=help
Archive     http://www.ee.ethz.ch/~slist/rrd-users
WebAdmin    http://www.ee.ethz.ch/~slist/lsg2.cgi