[mrtg] Problem with the way MRTG takes samples.

Sat Sep 4 00:11:51 MEST 1999

I work for a local ISP in Utah and we've been using MRTG for sometime to
graph and give us stats on multiple Cisco router interfaces.  After
learning a great deal more about SNMP I wrote a perl script of my own from
scratch to pull SNMP in and out octet oids in 5min intervals to generate
traffic logs on Cisco interfaces.  While learning and writing this script
I noticed something about the way MRTG records traffic samples that gave
me cause for concern.

This is 6 lines out of a MRTG logfile for a frame-relay connection we've
been monitoring.  These 5 lines are from 9/2/99 18:10:0 to 9/2/99 18:35:0.

                       Average
   EPOC#     In   Out  In   Out
1- 936317400  138 1351  286 2913
2- 936317700  907   22 1670   22
3- 936318000 2208   22 2693   22
4- 936318300 3059   22 3380   22
5- 936318600 2069 4487 3380 8240
6- 936318900  450 3793  968 8240
** The only modification of these lines was to align columns

The cronjob that runs MRTG to generate this log runs at 5min intervals on
these minutes:
2,7,12,17,22,27,32,37,42,47,52,57

According to the EPOC#s (seconds since 1970) in the first column of the
lines out of the MRTG log, the in and out octets were pulled via SNMP off
of the router interface EXACTLY every 5min on the 5min mark (min
0,5,10,15,etc...):

  936317400 = 9/2/99 18:10:0
  936317700 = 9/2/99 18:15:0
  936318000 = 9/2/99 18:20:0
  936318300 = 9/2/99 18:25:0
  936318600 = 9/2/99 18:30:0
  936318900 = 9/2/99 18:35:0

  This can't be possible because:
    1- The cronjob that updates this log dosn't run on the 5min mark.  It
       runs every 5min on the 2nd min mark (2,7,12,etc...).
    2- The cronjob dosn't just update this log.  The script that is run
       updates 82 logs one at a time, and this log is the 11th to be
       updated.  There would have been at least several seconds before the
       script would have updated this log.
    3- All 82 logs that are updated one at a time by this script all have
       exactly the same EPOC#s, saying that they all got updated at
       exactly the same time.

MRTG seems to me assuming or "faking" these EPOC#s so that the difference
between 2 will always be 300 seconds!?

Problems:

-The way you calculate bps from SNMP octets is:
   ((CurrentOctet - LastOctet) / Seconds) * 8
 where:
   CurrentOctet = the current interface in or out octets read via SNMP
   LastOctet = the last in or out octets that were read via SNMP
   Seconds = the number of seconds that have elapsed between the pulling
             of the LastOctet and the CurrentOctet
 You would calculate Seconds by getting the difference between the last
 EPOC# and the current EPOC#.  If Seconds isn't correct/real, the bps will
 be wrong.
-Also because MRTG is "faking" the EPOC#s, the time that the in and out
 octets were pulled is wrong, causing the timeline of the data to be up to
 5min off.

So in essence the timeline of the MRTG graphs, and the bps samples that
they represent are being calculated wrong!?

Is there a reason why MRTG isn't pulling the current time/EPOC# every
time after it pulls the SNMP in and out octets?  Is there some option that
needs to be set in the config file to force it to not assume an exact 300
seconds?  Was this a design decision for MRTG?  Calculating exact seconds
between pulls of octets should be as simple as getting the difference
between the current EPOC# and the last EPOC#, as long as the EPOC#'s are
pulled after every pull of SNMP octets.  Is there a reason that this was
not done?

Some help/insight on this matter would be greatly appreciated.  Thank you.

--
* To unsubscribe from the mrtg mailing list, send a message with the
  subject: unsubscribe to mrtg-request at list.ee.ethz.ch
* The mailing list archive is at http://www.ee.ethz.ch/~slist/mrtg