[mrtg] MRTG "flat-line" after a while of service outage

Fri Apr 13 01:28:55 CEST 2007

On 13/04/07, Eric Brander <mailinglists at rednarb.com> wrote:
First - thanks for taking the time to answer my questions.

On 4/12/07, Amos Shapira <amos.shapira at gmail.com> wrote:
> >
> > Hello,
> >
> > I've setup MRTG to graph the throuput of some internal software using an
> > external program and it worked well for a few weeks.
> >
> > Yesterday we took down all the monitored services for a major database
> > overhaul and since then MRTG records the service's throughput as zero even
> > though the external program continue to show increasing counters of work
> > done.
>
>
> Unknownaszero handles this.
>

It's not set in my configuration.

For instance running the external program outputs:
> >
> > 5990344
> > 5990344
> > 0
> >
> > And a few seconds later it will show:
> >
> > 5990414
> > 5990414
> > 0
>
>
Here are new samples, taken exactly 10 seconds apart:

6588843
6588843
0

6588904
6588904
0

The difference is 61, which suggest to me 6.1 per second.

Nothing has changed in the program or the MRTG configuration during that
> > time.
>
>
>
>
> Here is the relevant part from that counter's config file:
> >
> > Target[total]: `total-ti-counter /mnt/monitored/*/total.txt`
> > MaxBytes[total]: 50
> > Directory[total]: total
> > Title[total]: Total
>
>
> There should me a lot more relevant info in your config file. This doesn't
> show the whole picture.
>
>

MaxBytes is way more than the expected throughput (the throughput based on
> > the numbers above are 7.8 per second) but even setting it to 1000
> > instead of 50 doesn't help change the situation.
> >
> > What could be wrong?
> >
> >
> What does your .log file look like? Does MRTG error when it runs against
> this config and if so what is the error? What polling interval are you
> running? Any options such as perminute, perhour, etc. Bits or Bytes?

Here is the top of the log file:

1176419402 6585458 6585458
1176419402 0 0 0 0

Then these are zero's all the way back to just before the monitored services
were taken down, where it gives normal looking numbers.
I executed MRTG manually many times, also with --debug. No errors are given.
The polling interval is set to 5 minutes through cron job.
No other options are set. What you see there is all there is (the only field
I dropped from the e-mail was the "PageTop" setting).

>From the sample output you gave it looks like its more than 50, heck in the
> 2 samples over only a "few seconds" the difference is 70. And depending on
> that interval, you could easily exceed 1000 over a 5-minute polling cycle.
> Without knowing more I'd think your maxbytes setting is wrong by a handful
> of zeros. Your total.log file will show 2 entries and a bunch of zeros if
> the maxbytes is too low.

I though that MaxBytes is about the maximum expected speed of change, not
the total absolute counter value. (for instance when monitoring a LAN card -
MAxBytes will be about the maximum expected number of bytes the card can
transfer PER SECOND, not how many bytes the card can send or receive in a 5
minute interval) For instance at the above sample of 61 in 10 seconds,
MaxBytes would be compared to 6.1 for one second, not the absolute increase
of 1830 in 5 seconds, would it?

I'll try to increase MaxBytes to 5000 and see what happens.

-- 
> Eric Brander

Thanks,

--Amos
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.oetiker.ch/pipermail/mrtg/attachments/20070413/9e3c3f74/attachment.html