[smokeping-users] Re: fping efficiency]

Tue Nov 22 15:53:42 MET 2005

On Tue, Nov 22, 2005 at 12:50:08PM +0800, ChunjingHan wrote:

> 	Now I begin to consider the efficiency of smokeping.
> 	I find the introducation of the fping software ,the 
>       reference manuals explain the fping is a program to 
>       ping hosts in parallel. what's parallel way? I think 
>       that the fping pings every host in the config file 
>       orderly but not parallelly. 
> 	But the above I might wrong understand the parallelly ,
>       whether the parallel way means that the fping send the 
>       20 pings parallelly when pinging one host ? 
> 	with the increase of smokeping hosts amount ,I find that in  
>       5 minutes the smokeping can't finish all ping work,Do you meet 
>       the problem? How can reduce the time of a round-robin ping? 

Hi,

first: please use the smokeping-users mailing list rather than my
personal address, that way others can answer and/or benefit from the
answers as well. I'm CCing the list, please keep the CC or (preferably)
reply directly to the list. (Note that I first sent this accidentally
to the wrong list address; please disregard the first mail. Apologies
for the confusion.)

The fping program indeed does ping hosts in parallel. This means it does
not wait for one host to reply to all 20 pings before proceeding to the
new host. However, the 20 pings to each host are sent sequentially, not
in parallel. IOW, the program first sends one ping to each host, then 
a second ping to each host etc.

Your statement about 5 minutes not being enough surprised me, so I
looked a bit into it.  I found the total time ($T) spent on the
pinging to be something like

$T = $N * $C + ($M - 1) * min($p, $R) + $R + $t

where 
- $N is the number of hosts
- $C is a per-host initial delay that is around 6 milliseconds on my system
- $M is the number of pings
- $p is the 'hostinterval' parameter (fping "-p"), 1 second by default
- $t is the fping "-t" parameter, 0.5 seconds by default
- $R is the time that sending one ping to all hosts takes:

$R = $N * (floor($i/$S) + 1) * $S

- $i is the 'mininterval' parameter (fping "-i"), 0.01 seconds by default
- $S is the DEFAULT_SELECT_TIMEOUT value compiled into fping, 
  0.01 seconds by default
- floor(X) is the nearest integer that's less than X

For large $N with default values, I guess we can approximate
$T =~ $N * (.006 + ($M - 1) * .02) =~ $N * $M * .02

ie. 20 milliseconds / ping / host. 

For 1000 hosts and 20 pings, this would make around 400 seconds.
The 300 second mark would be at around 700-750 hosts.

How many hosts are you pinging?

As you see, the parameters that significantly affect the measurement
time are the number of hosts, the number of pings and the 'mininterval'
parameter.  I wouldn't really recommend lowering 'mininterval', though.
If you're really probing over 600 hosts and don't want to touch the
number of pings or the maximum measurement time, consider splitting the
probing between multiple servers, possibly with the Smokeping '--filter'
option.

If you want to stick with just one server and lower 'mininterval',
you'll have to run fping (and thus Smokeping also) as root (suid isn't
enough) or modify the MIN_INTERVAL definition in fping.c and recompile.
Another option is to define two FPing probes and split the load between
them. See the 'config.fping-instances' section in the
smokeping_examples document.

If anybody else has good or bad experiences with fping scaling, please
comment. There might well be errors in the calculations, but my
tests on an empty /23 network (512 hosts) seem to support these.

Cheers,
-- 
Niko

--
Unsubscribe mailto:smokeping-users-request at list.ee.ethz.ch?subject=unsubscribe
Help        mailto:smokeping-users-request at list.ee.ethz.ch?subject=help
Archive     http://lists.ee.ethz.ch/smokeping-users
WebAdmin    http://lists.ee.ethz.ch/lsg2.cgi