[smokeping-users] miniloss example alert creates a lot of alternating alerts
Peter Kristolaitis
alter3d at alter3d.ca
Thu Jul 22 13:15:16 CEST 2010
Hi Marc;
The solution to your problem depends a bit on the alerting requirements
at your site -- for example, do you care if alerts are delayed by one
ore more polling cycles in SmokePing?
My first suggestion would be to define an alert something like this:
+someloss
type = loss
pattern = 0%, 0%, 0%, 0%, 0%, >0%, >0%, >0%
comment = Loss detected for last 3 polling cycles
This alert definition will trigger when you have 3 *consecutive* polling
cycles with some packet loss; this is different than the alert you
tried (>0%, *12*, >0%, *12*, >0%, *12*) because the "*12*" in your
pattern acts as a wildcard... it will match ANYTHING. So your alert
pattern basically says "If we've seen >0% three times in the last 39
poll cycles, trigger an alert. Based on the data samples you
provided, I believe a consecutive model would suit your needs better.
If you need to get alerts sooner for actual problems, consider defining
a second alert as well... something like:
+bigloss
type = loss
pattern = 0%, 0%, 0%, >20%
comment = We have sudden, severe packet loss
If you enable both alerts on your hosts, you will get alerts when you
have persistent, low-to-moderate (1-20%) loss on the links, but you'll
get an alert immediately when there are bigger problems (>20% loss).
I think these rules will probably serve you well as a baseline, but
don't be afraid to experiment. I find it usually takes a couple weeks
of testing & tweaking to find an optimum set of alerts for any given
network simply due to different topology/architecturer, etc.
- Peter
On 21/07/2010 2:44 AM, Marc Haber wrote:
> Hi,
>
> when a network device is quite busy (for example, when backup of some
> servers connected to this device is going on), it's going to drop some
> packets, resulting in loss data like this:
>
> 00:35:23
> loss: 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%,
> 0%, 0%, 0%, 0%, 0%, 0%, 0%, 10%, 0%, 0%, 5%, 0%, 5%
> 00:35:52
> loss: 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%,
> 0%, 0%, 0%, 0%, 0%, 0%, 10%, 0%, 0%, 5%, 0%, 5%, 0%
> 00:48:53
> loss: 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 5%, 0%,
> 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 5%, 0%, 0%, 5%
> 00:49:23
> loss: 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 5%, 0%, 0%,
> 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 5%, 0%, 0%, 5%, 0%
> 00:49:53
> loss: 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 5%, 0%, 0%, 0%,
> 0%, 0%, 0%, 0%, 0%, 0%, 0%, 5%, 0%, 0%, 5%, 0%, 10%
> 00:50:23
> loss: 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 5%, 0%, 0%, 0%, 0%,
> 0%, 0%, 0%, 0%, 0%, 0%, 5%, 0%, 0%, 5%, 0%, 10%, 0%
> 00:53:54
> loss: 0%, 0%, 5%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 5%,
> 0%, 0%, 5%, 0%, 10%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 5%
> 00:54:24
> loss: 0%, 5%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 5%, 0%,
> 0%, 5%, 0%, 10%, 0%,0%, 0%, 0%, 0%, 0%, 0%, 5%, 0%
>
> When one has the miniloss alert from the smokeping_config defined,
> this causes the alarm to get raised and cleared multiple times over
> this rather short period of time:
>
> 00:35:23
> loss: 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%,
> 0%, 0%, 0%, 0%, 0%, 0%, 0%, 10%, 0%, 0%, 5%, 0%, 5%
> alarm raised
> 00:35:52
> loss: 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%,
> 0%, 0%, 0%, 0%, 0%, 0%, 10%, 0%, 0%, 5%, 0%, 5%, 0%
> alarm cleared
> 00:48:53
> loss: 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 5%, 0%,
> 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 5%, 0%, 0%, 5%
> alarm raised
> 00:49:23
> loss: 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 5%, 0%, 0%,
> 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 5%, 0%, 0%, 5%, 0%
> alarm cleared
> 00:49:53
> loss: 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 5%, 0%, 0%, 0%,
> 0%, 0%, 0%, 0%, 0%, 0%, 0%, 5%, 0%, 0%, 5%, 0%, 10%
> alarm raised
> 00:50:23
> loss: 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 5%, 0%, 0%, 0%, 0%,
> 0%, 0%, 0%, 0%, 0%, 0%, 5%, 0%, 0%, 5%, 0%, 10%, 0%
> alarm cleared
> 00:53:54
> loss: 0%, 0%, 5%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 5%,
> 0%, 0%, 5%, 0%, 10%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 5%
> alarm raised
> 00:54:24
> loss: 0%, 5%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 5%, 0%,
> 0%, 5%, 0%, 10%, 0%,0%, 0%, 0%, 0%, 0%, 0%, 5%, 0%
> alarm cleared
>
> I am wondering whether it makes sense to clear the alarm just because
> there is a 0% in the last slot of the data being considered. This
> causes the alarm to flap in the case of occasional packet loss.
>
> I am thinking of either modifing the alarm so only go of for changes>
> 5 %, like
>
> +miniloss
> type = loss
> # in percent
> pattern =>5%,*12*,>5%,*12*,>5%
> comment = detected loss 3 times over the last two hours
>
> or to have it stay raised even if the current loss is 0%, like
>
> +miniloss
> type = loss
> # in percent
> pattern =>0%,*12*,>0%,*12*,>0%,*12*
> comment = detected loss 3 times over the last two hours
>
> or
> +miniloss
> type = loss
> # in percent
> pattern =>0%,*12*,>0%,*12*,>0%,*12*,>=0%
> comment = detected loss 3 times over the last two hours
>
> I would like to ask the more experienced users how you would act in my
> position. Would you ditch the miniloss alert altogether, would you
> modify it, and if so, how?
>
> Greetings
> Marc
>
>
More information about the smokeping-users
mailing list