[smokeping-users] Re: Behaviour during DNS outages

Tobias Oetiker oetiker+r at ee.ethz.ch
Tue May 25 23:35:01 MEST 2004


Hoi Ralf,

try this patch

--- basefork.pm~        2004-05-25 23:32:52.044015000 +0200
+++ basefork.pm 2004-05-25 23:33:47.234383000 +0200
@@ -1,6 +1,7 @@
 package probes::basefork;

 my $DEFAULTFORKS = 5;
+my $DEFAULTTIMEOUT = 5;

 =head1 NAME

@@ -18,6 +19,7 @@
  + MyForkingProbe
  # run this many concurrent processes
  forks = 10
+ timeout = 10

  + MyOtherForkingProbe
  # we don't want any concurrent processes at all for some reason.
@@ -45,10 +47,9 @@
 processes are finished. This continues until all the targets have been
 tested.

-There is a timeout in which each child has to finish. This is determined
-by the Smokeping global database step and the `forks' variable:
-
-S< timeout = E<lt>stepE<gt> / ceiling(E<lt># of targetsE<gt> / E<lt>forksE<gt>)>
+The timeout in which each child has to finish is set to 5 seconds by
+default. You can set it differently if you want to using the timeout
+propperty of the probe.

 If the child isn't finished when the timeout occurs, it
 will be killed along with any processes it has started.
@@ -102,15 +103,12 @@
        my @targets = @{$self->targets};
        return unless @targets;

-       my $forks = $self->{properties}{forks};
-       $forks = $DEFAULTFORKS unless defined $forks;
-       $forks = $DEFAULTFORKS if $forks !~ /^\d+$/ or $forks < 1;
-
-       my $step = $self->{cfg}{Database}{step};
-       my $rounds = ceil(@targets / $forks);
-       my $timeout = floor($step / $rounds);
+       my $forks = $self->{properties}{forks} || $DEFAULTFORKS;
+
+       my $timeout = $self->{properties}{timeout} || $DEFAULTTIMEOUT;
+
         $self->{rtts}={};
-       $self->do_debug("forks $forks, rounds $rounds, timeout $timeout");
+       $self->do_debug("forks $forks, timeout $timeout");

        while (@targets) {
                my %targetlookup;



cheers
tobi
May 19 Ralf Hildebrandt wrote:

> My smokeping installation uses (at least I think so!) the dnscache on
> localhost.
> # cat /etc/resolv.conf
> search charite.de
> nameserver 127.0.0.1
>
> Yet, whenever there's a network outage, smokeping cannot reach any
> hosts, the probes take ages and thus "na" is inserted into the RRD
> files.
>
> During that time I get:
>
> May 19 15:42:23 watchmen smokeping[32462]: DNS: opa.charite.de: timeout (90 s) reached, killing the probe.
> May 19 15:42:27 watchmen smokeping[32462]: WARNING: smokeping took 297 seconds to complete 1 round of polling. It should complete polling in
> 90 seconds. You may have unresponsive devices in your setup.
> May 19 15:45:00 watchmen smokeping[32462]: EchoPingHttp: kruemel.charite.de: timeout (90 s) reached, killing the probe.
> May 19 15:45:01 watchmen smokeping[32462]: EchoPingHttp: rzdoku.charite.de: timeout (90 s) reached, killing the probe.
> May 19 15:48:23 watchmen smokeping[32462]: DNS: opa.charite.de: timeout (90 s) reached, killing the probe.
> May 19 15:48:25 watchmen smokeping[32462]: WARNING: smokeping took 295 seconds to complete 1 round of polling. It should complete polling in
> 90 seconds. You may have unresponsive devices in your setup.
> May 19 15:49:51 watchmen smokeping[3312]: WARNING: /usr/bin/echoping -h / -n 20 rzdoku.charite.de was not happy: Can't connect to server
> (No route to host)   at /usr/share/perl5/smokeping/Smokeping.pm line 816
> May 19 15:51:00 watchmen smokeping[32462]: EchoPingHttp: kruemel.charite.de: timeout (90 s) reached, killing the probe.
> May 19 15:54:28 watchmen smokeping[32462]: DNS: opa.charite.de: timeout (90 s) reached, killing the probe.
> May 19 15:54:31 watchmen smokeping[32462]: WARNING: smokeping took 301 seconds to complete 1 round of polling. It should complete polling in
> 90 seconds. You may have unresponsive devices in your setup.
> May 19 15:57:00 watchmen smokeping[32462]: EchoPingHttp: kruemel.charite.de: timeout (90 s) reached, killing the probe.
> May 19 15:57:01 watchmen smokeping[32462]: EchoPingHttp: rzdoku.charite.de: timeout (90 s) reached, killing the probe.
> May 19 16:00:21 watchmen smokeping[32462]: DNS: opa.charite.de: timeout (90 s) reached, killing the probe.
> May 19 16:00:22 watchmen smokeping[32462]: WARNING: smokeping took 292 seconds to complete 1 round of polling. It should complete polling in
> 90 seconds. You may have...
>
> Is there a way of specifiying a lower timeout?
>
>

-- 
 ______    __   _
/_  __/_  / /  (_) Oetiker @ ISG.EE, ETZ J97, ETH, CH-8092 Zurich
 / // _ \/ _ \/ /  System Manager, Time Lord, Coder, Designer, Coach
/_/ \.__/_.__/_/   http://people.ee.ethz.ch/~oetiker   +41(0)44-632-5286

--
Unsubscribe mailto:smokeping-users-request at list.ee.ethz.ch?subject=unsubscribe
Help        mailto:smokeping-users-request at list.ee.ethz.ch?subject=help
Archive     http://www.ee.ethz.ch/~slist/smokeping-users
WebAdmin    http://www.ee.ethz.ch/~slist/lsg2.cgi



More information about the smokeping-users mailing list