[smokeping-users] Large deployment problems

Darren Murphy darren at victoriajd.com
Wed Aug 3 14:56:15 CEST 2011


On 3 August 2011 08:15, Josh Wisman <jwisman at gmail.com> wrote:

> 2. Because of the number of nodes, fping poller does not finish in 300
> seconds.  I have blazemode enabled. Is there a way to run multiple fping
> probes or increase parallelization? Any help would be greatly appreciated.

As others have mentioned, the answer is yes, and documented. I run 20
FPing probes on each of 3 slaves, with each slave polling ~1000 hosts
(50 hosts per probe).
One problem I have found with running multiple FPing probes is that
individual probes on my slaves tend to die from time to time.
I haven't been able to figure out why they die, but I have found that
monit (http://mmonit.com/monit/) is particularly effective in keeping
the required number of probes running.
My monit config for smokeping is quite simple and looks like so:

check process smokeping with pidfile /var/smokeping/smokeping.pid
    start program = "/etc/init.d/smokeping start"
    stop program  = "/etc/init.d/smokeping stop"
    if children < 20 then restart
    if 3 restarts within 5 cycles then alert

The above will restart smokeping at any time than less than the
required number (20) of probes are found, and alert me via email if 3
restarts occur within 5 checks.

Hope this helps,
Darren



More information about the smokeping-users mailing list