[smokeping-users] Large deployment problems
darren at victoriajd.com
Wed Aug 3 14:56:15 CEST 2011
On 3 August 2011 08:15, Josh Wisman <jwisman at gmail.com> wrote:
> 2. Because of the number of nodes, fping poller does not finish in 300
> seconds. I have blazemode enabled. Is there a way to run multiple fping
> probes or increase parallelization? Any help would be greatly appreciated.
As others have mentioned, the answer is yes, and documented. I run 20
FPing probes on each of 3 slaves, with each slave polling ~1000 hosts
(50 hosts per probe).
One problem I have found with running multiple FPing probes is that
individual probes on my slaves tend to die from time to time.
I haven't been able to figure out why they die, but I have found that
monit (http://mmonit.com/monit/) is particularly effective in keeping
the required number of probes running.
My monit config for smokeping is quite simple and looks like so:
check process smokeping with pidfile /var/smokeping/smokeping.pid
start program = "/etc/init.d/smokeping start"
stop program = "/etc/init.d/smokeping stop"
if children < 20 then restart
if 3 restarts within 5 cycles then alert
The above will restart smokeping at any time than less than the
required number (20) of probes are found, and alert me via email if 3
restarts occur within 5 checks.
Hope this helps,
More information about the smokeping-users