[smokeping-users] pocketless alarm question, sending to soon the alarm emails.

Gregory Sloop gregs at sloop.net
Fri Jul 3 22:04:11 CEST 2015


Putting it back on-list. [you emailed me directly - no offense taken - but it's better to be on-list...]

Ok...

Heavy load on a pipe, at least in my experience, doesn't cause much packet loss. It will however increase latency. So, I think a test like >10%,*5*,>10%,*5*,>10%,*5* [hopefully there no syntax errors there...]
[meaning any three losses of >10% over 3-30m would trigger things.]

I use: 
>15%, *3* ,>15%, *3* ,>15%
>30%, *10* ,>30%, *10*, >30%, *10*, >30%
>50%, *10* ,>50%, *10*, >50%, *10*, >50%
...for a first, second and third level trigger.

But - I only use the triggers to generate an MTR - the MTR comes in very handy in arguments with providers [Hello Comcast] when they claim the problem must be somewhere else other than their network. (Though to be fair, the tendency to blame someone else is a *very* strong one in most help-desk/support situations. And it so pisses me off!) The MTR script runs an MTR trace of that path, and emails me the result.

I do all my *alerts* with Nagios - using the smokeping plugin.
In those cases, I use something like a warning for >10% loss or more for 5m, and critical with >40$ for 5m. [Nagios doesn't use an elaborate trigger system like Smokeping. But I don't get many false-positives with either setup. YMMV.]

Using Nagios allows me to more carefully manage alerts and when/how/where they're delivered - which makes life a lot easier. For example - no need to buzz me about a non critical backup link at 2a. But when the smelly stuff does hit the fan, I'll get the alerts I need. So, I abandoned using Smokeping for alerts quite a number of years ago.

I could dig up specifics if you really need them, but that's what I recall off the top of my head.

Cheers,
Greg


Hi Greg 

thanks for your email, agree on make it simpeler.

The goal i want is to monitor a serious problem asap, but prevent from false positives generated by people congesting the line with huge downloads.

How would you solve this, do you monitor such and would you be able to share the configuration?

thanks Rob


Op 3 jul. 2015, om 18:08 heeft Gregory Sloop <gregs at sloop.net> het volgende geschreven:

Re: [smokeping-users] pocketless alarm question, sending to soon the alarm emails. 
A few thoughts - though not exactly an answer to your question:

1) I'm often too dense to grok out why more elaborate triggers don't work the way I want, and your falls into that category. But this seems, IMO, to be a very common problem. So, my solution: Make them simpler. Way simpler. [It's kind of like fancy regexs - I think I'm *so* clever and pat myself on the back. But then I actually run that regex against more real-world data, and well, it doesn't end pretty... So, I usually go back to - simpler is better, unless it's impossible to solve otherwise.] 

2) In your case - do you really want to wait 75 minutes to find out a connection is completely down? [Or perhaps you have another trigger that does that.] But even if you do - this is just my opinion - but loss greater than 10% over more than 10-15 minutes is a sign of a _serious_ problem. So, I have simple triggers that let me know if I have even modest loss over fairly short periods of time. Yes, that can increase the number of alerts you see - but if you've got a connection with that many problems, you need to address the underlying problem, not just chirp at you about it less often.

3) Yes, at first glance, your pattern appears right. However, I think the *25* means *up to* 25 samples. [0-25]
So, a 10% loss, followed by another 10% loss the very next sample will match a pattern of: >10%,*25*,>10% or it will also match 
A 10% loss, followed by another 10% loss with 1-25 samples between them with <10% loss.

So your example will also trigger with 4 subsequent samples of >10% loss each [i.e. over 4 sample periods]. It would also trigger in the conditions you envision - (1) >10% loss sample, and then another > 10% loss, 25 samples later, and another 25 samples later etc... 

Again, I think less fancy is more likely to produce a result that's still useful and a lot less trouble to test and verify it works in the conditions you envision.

HTH

-Greg


RdH> Hi team

RdH> I hope you are doing great today?

RdH> what a great tool! Love to run it on my RBp2 and monitor the
RdH> internet connection! I have a small question but i can’t solve it
RdH> myself would you try to help me?

RdH> The goal is to have an alert when the internetline is
RdH> experiencing packetloss, but for a longer time not on every
RdH> glitch. Im using FPingnormal with a step of 180 (means 3 minutus) 

RdH> The loss pattern i defined is like :
RdH> 10%,*25*,>10%,*25*,>10%,*25*,>10% which means based on the how
RdH> to’s provided on the website: Take 25 samples (which is 25*3
RdH> minutes) so if the packetloss exists from start -> 75 minutes
RdH> later still>10% -> 75 minutes later still >10% and another 75
RdH> minutes later still >10% send out an email.

RdH> But it send out an email almost directly, please see the
RdH> screenshots where the packets loss starts and when i received the
RdH> email, that timeframe is not even close to 75 minutes but more like a few minutes.

RdH> Could you advise how to make the packetloss alarm more reliable
RdH> where it last for at least 1 hour before sending out the email?

RdH> Many thanks! Cheers Rob

RdH> how the system works now (screenshots where to big)

RdH> -packetloss started about 07:40 stable around 07:55 and started again 08:03
RdH> -email alarm received around  08:07 after the second block of packets loss
RdH> -email cleared received  after 3 minutes around 08:10

RdH> _______________________________________________
RdH> smokeping-users mailing list
RdH> smokeping-users at lists.oetiker.ch
RdH> https://lists.oetiker.ch/cgi-bin/listinfo/smokeping-users

-- 
Gregory Sloop, Principal: Sloop Network & Computer Consulting
Voice: 503.251.0452 x82
EMail: gregs at sloop.net
http://www.sloop.net
---
_______________________________________________
smokeping-users mailing list
smokeping-users at lists.oetiker.ch
https://lists.oetiker.ch/cgi-bin/listinfo/smokeping-users

-- 
Gregory Sloop, Principal: Sloop Network & Computer Consulting
Voice: 503.251.0452 x82
EMail: gregs at sloop.net
http://www.sloop.net
---
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.oetiker.ch/pipermail/smokeping-users/attachments/20150703/9b6e2d12/attachment.html>


More information about the smokeping-users mailing list