[smokeping-users] Re: Scalability
Marc Powell
mpowell at ena.com
Thu Jun 20 02:29:44 MEST 2002
Sure thing. Here is what I have done, I created a test smokeping binary that points to my original config file with 546 hosts on this particular data collector. I ran it with -debug and -nodaemon (I think debug implies nodaemon, but I wanted to cover all bases).
# [smokep at dc2 ~/bin]date ; ./smokeping.test -debug -nodaemon ; date
Wed Jun 19 19:20:10 CDT 2002
### fping seems to report in 1 miliseconds
Launched successfully
FPing: probing 546 targets
Wed Jun 19 19:28:11 CDT 2002
This 8 minute duration seems to be fairly consistent, at least right now ;)
Here's a snippet of truss about 4 minutes into the run:
[smokep at dc2 ~]$ date
Wed Jun 19 19:23:49 CDT 2002
[smokep at dc2 ~]$ truss -fea -p 14837
14837: psargs: /usr/local/bin/perl -w ./smokeping.test -debug -nodaemon
14837: read(7, " I C M P T i m e E x".., 5120) = 70
14837: read(7, 0x004E380C, 5120) (sleeping...)
14837: read(7, " I C M P T i m e E x".., 5120) = 69
14837: read(7, " I C M P T i m e E x".., 5120) = 70
14837: read(7, 0x004E380C, 5120) (sleeping...)
14837: read(7, " I C M P T i m e E x".., 5120) = 70
14837: read(7, " I C M P T i m e E x".., 5120) = 74
14837: read(7, " I C M P T i m e E x".., 5120) = 70
14837: read(7, " I C M P T i m e E x".., 5120) = 75
14837: read(7, 0x004E380C, 5120) (sleeping...)
14837: read(7, " I C M P T i m e E x".., 5120) = 70
14837: read(7, 0x004E380C, 5120) (sleeping...)
14837: read(7, " I C M P T i m e E x".., 5120) = 18
14837: read(7, " f r o m ", 5120) = 6
14837: read(7, " 1 0 . 5 5 . 0 . 1 1", 5120) = 10
14837: read(7, " f o r I C M P E c".., 5120) = 23
14837: read(7, " 1 7 2 . 3 1 . 5 6 . 2", 5120) = 11
14837: read(7, "\n", 5120) = 1
14837: read(7, 0x004E380C, 5120) (sleeping...)
14837: read(7, " I C M P T i m e E x".., 5120) = 70
14837: read(7, 0x004E380C, 5120) (sleeping...)
14837: read(7, " I C M P T i m e E x".., 5120) = 70
14837: read(7, " I C M P T i m e E x".., 5120) = 74
14837: read(7, " I C M P T i m e E x".., 5120) = 70
14837: read(7, " I C M P T i m e E x".., 5120) = 75
14837: read(7, 0x004E380C, 5120) (sleeping...)
14837: read(7, " I C M P T i m e E x".., 5120) = 70
14837: read(7, 0x004E380C, 5120) (sleeping...)
14837: read(7, " I C M P T i m e E x".., 5120) = 69
14837: read(7, " I C M P T i m e E x".., 5120) = 70
14837: read(7, 0x004E380C, 5120) (sleeping...)
^C[smokep at dc2 ~]$ date
Wed Jun 19 19:24:44 CDT 2002
If there is anything else that I can provide that would be of assistance, please don't hesitate to let me know.
Thanks,
Marc
-----Original Message-----
From: Tobias Oetiker [mailto:oetiker at ee.ethz.ch]
Sent: Wed 6/19/2002 5:26 PM
To: Marc Powell
Cc: Smokeping
Subject: Re: [smokeping-users] Re: Scalability
Yesterday Marc Powell wrote:
> The only major problem I am having is that I see gaps in the graphs
> (10-15 minutes) for those regions with relatively high numbers of hosts
> down (20-30). We're monitoring schools so it's the off season here in
> the US and the routers fluctuate depending on what maintenance is going
> on at the schools, whether the janitor has spilt his coffee in the
> router, etc... I am attributing the gaps to the slower response time for
> ICMP UNREACHABLE's from fping, which lengthens the overall time it takes
> before smokeping spawns the next run to 10-15 minutes or longer. Since
> smokeping appears to wait until the fping process terminates before
> writing to the RRDs or spawning the next fping process, the gaps are
> appearing for all hosts in a region. To minimize the number of hosts
> affected by this problem, I have just implemented unique configurations
> for each alphabetical grouping per region so that I can spawn a
> smokeping daemon for each grouping as opposed to each region (i.e. 5
> smokeping processes per data collector). As a result, I have a feature
> request or two to make things easier:
try running smokeping by hand, at least in theory it should ping
ALL the hosts in your config in parallel. The time it will wait for
a 'lost' paket is about 1 second at most so this means in theory a
fping run is over in 20 seconds regardless of the number of
machines involved. Now there is a small gap between each icmp
packet sent out from fping, so there is an impact per machine but
it should not at all depend on how long the machine has to answer
... this after all is the whole motivation behinde fping ...
> 1) Add a pidfile directive to either complement or replace
> piddir. Currently, it is necessary to either create a directory for each
> pid file specifically or remove the pidfile before starting the next
> smokeping process.
running multiple smokeping processes is not the solution ... if
fping has a bug, we will fix fping ...
> 2) The ability to INCLUDE external files within a config file.
> This should help cut down on the number of unique files I'm having to
> create.
this is already there ... check the documentation on
ISG::ParseConfig
cheers
tobi
--
______ __ _
/_ __/_ / / (_) Oetiker, OETIKER+PARTNER AG, Gallusstrasse 25
/ // _ \/ _ \/ / CH-4600 Olten, phoneto:+41(0)62 213 9909
/_/ \.__/_.__/_/ tobi at oetiker.ch http://google.com/search?q=tobi
--
Unsubscribe mailto:smokeping-users-request at list.ee.ethz.ch?subject=unsubscribe
Help mailto:smokeping-users-request at list.ee.ethz.ch?subject=help
Archive http://www.ee.ethz.ch/~slist/smokeping-users
WebAdmin http://www.ee.ethz.ch/~slist/lsg2.cgi
More information about the smokeping-users
mailing list