[smokeping-users] Re: Scalability

Marc Powell mpowell at ena.com
Thu Jun 20 02:29:44 MEST 2002


Sure thing. Here is what I have done, I created a test smokeping binary that points to my original config file with 546 hosts on this particular data collector. I ran it with -debug and -nodaemon (I think debug implies nodaemon, but I wanted to cover all bases).
 
# [smokep at dc2 ~/bin]date ; ./smokeping.test -debug -nodaemon ; date             
Wed Jun 19 19:20:10 CDT 2002
### fping seems to report in 1 miliseconds
Launched successfully
FPing: probing 546 targets
Wed Jun 19 19:28:11 CDT 2002
 
This 8 minute duration seems to be fairly consistent, at least right now ;)
 
Here's a snippet of truss about 4 minutes into the run:
 
[smokep at dc2 ~]$ date
Wed Jun 19 19:23:49 CDT 2002
[smokep at dc2 ~]$ truss -fea -p 14837
14837:  psargs: /usr/local/bin/perl -w ./smokeping.test -debug -nodaemon
14837:  read(7, " I C M P   T i m e   E x".., 5120)     = 70
14837:  read(7, 0x004E380C, 5120)       (sleeping...)
14837:  read(7, " I C M P   T i m e   E x".., 5120)     = 69
14837:  read(7, " I C M P   T i m e   E x".., 5120)     = 70
14837:  read(7, 0x004E380C, 5120)       (sleeping...)
14837:  read(7, " I C M P   T i m e   E x".., 5120)     = 70
14837:  read(7, " I C M P   T i m e   E x".., 5120)     = 74
14837:  read(7, " I C M P   T i m e   E x".., 5120)     = 70
14837:  read(7, " I C M P   T i m e   E x".., 5120)     = 75
14837:  read(7, 0x004E380C, 5120)       (sleeping...)
14837:  read(7, " I C M P   T i m e   E x".., 5120)     = 70
14837:  read(7, 0x004E380C, 5120)       (sleeping...)
14837:  read(7, " I C M P   T i m e   E x".., 5120)     = 18
14837:  read(7, "   f r o m  ", 5120)                   = 6
14837:  read(7, " 1 0 . 5 5 . 0 . 1 1", 5120)           = 10
14837:  read(7, "   f o r   I C M P   E c".., 5120)     = 23
14837:  read(7, " 1 7 2 . 3 1 . 5 6 . 2", 5120)         = 11
14837:  read(7, "\n", 5120)                             = 1
14837:  read(7, 0x004E380C, 5120)       (sleeping...)
14837:  read(7, " I C M P   T i m e   E x".., 5120)     = 70
14837:  read(7, 0x004E380C, 5120)       (sleeping...)
14837:  read(7, " I C M P   T i m e   E x".., 5120)     = 70
14837:  read(7, " I C M P   T i m e   E x".., 5120)     = 74
14837:  read(7, " I C M P   T i m e   E x".., 5120)     = 70
14837:  read(7, " I C M P   T i m e   E x".., 5120)     = 75
14837:  read(7, 0x004E380C, 5120)       (sleeping...)
14837:  read(7, " I C M P   T i m e   E x".., 5120)     = 70
14837:  read(7, 0x004E380C, 5120)       (sleeping...)
14837:  read(7, " I C M P   T i m e   E x".., 5120)     = 69
14837:  read(7, " I C M P   T i m e   E x".., 5120)     = 70
14837:  read(7, 0x004E380C, 5120)       (sleeping...)
^C[smokep at dc2 ~]$ date
Wed Jun 19 19:24:44 CDT 2002

If there is anything else that I can provide that would be of assistance, please don't hesitate to let me know.
 
Thanks,
 
Marc

	-----Original Message----- 
	From: Tobias Oetiker [mailto:oetiker at ee.ethz.ch] 
	Sent: Wed 6/19/2002 5:26 PM 
	To: Marc Powell 
	Cc: Smokeping 
	Subject: Re: [smokeping-users] Re: Scalability
	
	

	Yesterday Marc Powell wrote: 

	> The only major problem I am having is that I see gaps in the graphs 
	> (10-15 minutes) for those regions with relatively high numbers of hosts 
	> down (20-30). We're monitoring schools so it's the off season here in 
	> the US and the routers fluctuate depending on what maintenance is going 
	> on at the schools, whether the janitor has spilt his coffee in the 
	> router, etc... I am attributing the gaps to the slower response time for 
	> ICMP UNREACHABLE's from fping, which lengthens the overall time it takes 
	> before smokeping spawns the next run to 10-15 minutes or longer. Since 
	> smokeping appears to wait until the fping process terminates before 
	> writing to the RRDs or spawning the next fping process, the gaps are 
	> appearing for all hosts in a region. To minimize the number of hosts 
	> affected by this problem, I have just implemented unique configurations 
	> for each alphabetical grouping per region so that I can spawn a 
	> smokeping daemon for each grouping as opposed to each region (i.e. 5 
	> smokeping processes per data collector). As a result, I have a feature 
	> request or two to make things easier: 

	try running smokeping by hand, at least in theory it should ping 
	ALL the hosts in your config in parallel. The time it will wait for 
	a 'lost' paket is about 1 second at most so this means in theory a 
	fping run is over in 20 seconds regardless of the number of 
	machines involved. Now there is a small gap between each icmp 
	packet sent out from fping, so there is an impact per machine but 
	it should not at all depend on how long the machine has to answer 
	... this after all is the whole motivation behinde fping ... 

	>       1) Add a pidfile directive to either complement or replace 
	> piddir. Currently, it is necessary to either create a directory for each 
	> pid file specifically or remove the pidfile before starting the next 
	> smokeping process. 

	running multiple smokeping processes is not the solution ... if 
	fping has a bug, we will fix fping ... 

	>       2) The ability to INCLUDE external files within a config file. 
	> This should help cut down on the number of unique files I'm having to 
	> create. 

	this is already there ... check the documentation on 
	ISG::ParseConfig 

	cheers 
	tobi 

	-- 
	 ______    __   _ 
	/_  __/_  / /  (_) Oetiker, OETIKER+PARTNER AG, Gallusstrasse 25 
	 / // _ \/ _ \/ / CH-4600 Olten, phoneto:+41(0)62 213 9909 
	/_/ \.__/_.__/_/ tobi at oetiker.ch http://google.com/search?q=tobi 


--
Unsubscribe mailto:smokeping-users-request at list.ee.ethz.ch?subject=unsubscribe
Help        mailto:smokeping-users-request at list.ee.ethz.ch?subject=help
Archive     http://www.ee.ethz.ch/~slist/smokeping-users
WebAdmin    http://www.ee.ethz.ch/~slist/lsg2.cgi



More information about the smokeping-users mailing list