[smokeping-users] Re: Looking for a good way of timing Smokeping
Tobias Oetiker
oetiker at ee.ethz.ch
Thu Jan 15 17:18:25 MET 2004
Today Simon Westlake wrote:
Hi Simon,
try this patch:
--- Smokeping.pm.orig Thu Jan 15 17:14:34 2004
+++ Smokeping.pm Thu Jan 15 17:15:55 2004
@@ -1899,6 +1899,7 @@
do_log("Launched successfully");
report_probes($probes);
while (1) {
+ my $now = time;
if ($opt{debug}) {
map { $probes->{$_}->debug(1) if $probes->{$_}->can('debug') }
keys %$probes;
@@ -1906,6 +1907,10 @@
run_probes $probes;
update_rrds $cfg, $probes, $cfg->{Targets}{probe}, $cfg->{Targets}, $cfg->{General}{datadir};
exit 0 if $opt{debug};
+ my $runtime = time - $now;
+ warn "WARNING: smokeping took $runtime seconds to complete 1 round of polling. ".
+ "It should complete polling in $cfg->{Database}{step} seconds. ".
+ "You may have unresponsive devices in your setup.\n" if $runtime > $cfg->{Database}{step};
sleep $cfg->{Database}{step} - time % $cfg->{Database}{step};
}
}
Now smokeping will complain when it is taking too long to complete a round.
How is the load on your machine while smokeping is polling ?
The reason for the gaps when you widen the step is, that your rrds
have the maximal acceptable update time internally. you can use rrdtool tune to change that
tobi
> Hi,
>
> A few weeks ago I posted about gaps in Smokeping graphs, and the eventual conclusion was that it was simply taking too long for Smokeping to run.
>
> I tried running it at 10 minutes rather than 5 but, strangely, there were more gaps at 10 minutes than at 5 (this always seems to be the case for me.. I tried it again recently and had the same problem.)
>
> My previous solution was to remove devices from Smokeping that were regularly unresponsive - their removal seemed to resolve the problem.
>
> So, I'm stuck running at 5 minutes, as anything above that seems to produce more gaps. However, I'm adding 20+ devices a week to Smokeping and I'm starting to get gaps again. I'm assuming it's taking too long to run again, but I only have ~15 unresponsive devices at a time (out of 1300) so it doesn't seem to be a problem with excessive timeouts.
>
> I measure the amount of time MRTG takes to run for monitoring purposes by doing:
>
> x=`date +%s`;z=`date`;/usr/local/mrtg-2/bin/mrtg /usr/local/mrtg.cfg;y=`date +%s`;runtime=`expr $y - $x`;echo "$z runtime was $runtime seconds" >>/home/simon/runtime
>
> I can't, however, think of a good way to do this for Smokeping.
>
> So, two quick questions..
>
> Can anyone think of a way of doing something similar for Smokeping?
> Does anyone have an example of a relatively aggressive probe configuration for fping for monitoring large numbers of devices? I did try modifying parameters to pass to fping as specified in the Smokeping documentation, but I think I must be reading it incorrectly, as I couldn't get the syntax right. I'd be happy to increase the wait by a very slightly increment for successive timeouts and to give up at ~800ms or so.
>
> The eventual solution is going to be to split the polling over a large number of servers, but for the time being, I'm stuck running it on a single box. I'm 99% sure it's a case of excessive wait time, as the server is relatively powerful.
>
> Thanks for any help you can provide.
>
>
> --
> Unsubscribe mailto:smokeping-users-request at list.ee.ethz.ch?subject=unsubscribe
> Help mailto:smokeping-users-request at list.ee.ethz.ch?subject=help
> Archive http://www.ee.ethz.ch/~slist/smokeping-users
> WebAdmin http://www.ee.ethz.ch/~slist/lsg2.cgi
>
--
______ __ _
/_ __/_ / / (_) Oetiker @ ISG.EE, ETZ J97, ETH, CH-8092 Zurich
/ // _ \/ _ \/ / System Manager, Time Lord, Coder, Designer, Coach
/_/ \.__/_.__/_/ http://people.ee.ethz.ch/~oetiker +41(0)1-632-5286
--
Unsubscribe mailto:smokeping-users-request at list.ee.ethz.ch?subject=unsubscribe
Help mailto:smokeping-users-request at list.ee.ethz.ch?subject=help
Archive http://www.ee.ethz.ch/~slist/smokeping-users
WebAdmin http://www.ee.ethz.ch/~slist/lsg2.cgi
More information about the smokeping-users
mailing list