[mrtg-developers] Fwd: Re: MRTG check scheduling while in daemon mode

Tue Oct 5 21:03:19 CEST 2010

	Sorry, wrong identity first time
	/N

-------- Original Message --------
Subject: Re: [mrtg-developers] MRTG check scheduling while in daemon mode
Date: Tue, 05 Oct 2010 17:49:54 +0100
From: Niall O'Reilly <Niall.oReilly at ucd.ie>
To: mrtg-developers at lists.oetiker.ch

On 03/10/10 23:18, Steve Shipway wrote:
> Has anyone on the list any thoughts on the MRTG check scheduler?

	Not exactly, but rather some comments on the method
	you suggest.

> Currently (we're considering Daemon mode only, here), every 5 mins it will
> run ALL the Target checks sequentially, running multiple threads according
> to the Forks: directive.  After all checks are finished, it will sleep until
> the next 5-min cycle starts.
> 
> This is sub-optimal because
> 
> 1)      You get a huge burst of CPU usage followed by a period of silence,
> which can make the frontend go slow and messes up monitoring of the system's
> own CPU
> 
> 2)      If the checks exceed the 5min window, then you miss a polling cycle
> and need to manually tune your forks upwards.

	Agreed.

> I would propose an alternative method of scheduling.
> 
>  
> 
> 1.       Rather than specifying a number of forks, make it a MAXIMUM number
> (a bit like when defining threads in apache)

	Makes sense.

> 2.       After the initial read of the CFG files, MRTG knows how many
> Targets there are. Divide the Interval by this to get the interleave.

	I'ld be inclined to use a bigger number than the Target count
	(maybe twice the count or so), and so create some slack for
	delays caused by equipment or network incidents.

>  Then,
> start a new check every interleave, starting a new thread if necessary and
> if we've not hit the maximum threads.
> 
> Benefits would be that it can expand to handle more targets, and spreads the
> load over the window.
> 
> Disadvantages would be that

	Adding deliberate jitter to the probe cycle might disturb
	the interpolation of values for the nominal (interval-aligned)
	probe instants, or at least give rise to "interesting" aliasing.

> it is hard to tell when you're reaching capacity,

	Depends on what is available by way of fork management.

	Wouldn't it make sense to (try to) count elapsed, CPU, and
	wait (disk and network) times for each fork, and derive some
	estimate of remaining headroom?

	I have very little idea of the level of difficulty involved
	in this.

> and (more importantly) it might be hard to do the optimisation
> that MRTG does where a single device is queried once for all interfaces.

	Probably less a problem than it looks at first sight.

	The grouping of Targets MRTG already does could surely
	be exploited as input to the interleaving calculation.

> We coded up basically this system here, however it didn't use MRTG in daemon
> mode which negates a lot of the benefits you can gain from daemon mode

	Not only that, but retaining state from run to run may allow
	Target 'reputation' (based on delays and retries) to be used
	to tune the interleaving strategy for the actual environment.
	Without daemon mode, this opportunity would either have to be
	systematically foregone, or would require cacheing to disk.

> and
> the new RRD memory-mapped IO.  I've not yet looked at coding it directly
> into the MRTG code.
> 
> Anyone have any thoughts?

	You did ask!  8-)

	I hope this helps.

	Niall O'Reilly