[smokeping-users] pocketless alarm question, sending to soon the alarm emails.

Rob de Hoog rob at robdehoog.nl
Sat Jul 4 09:11:08 CEST 2015


Hi Greg

Thanks for answering. I configured your tree level steps and will see what's happening :)

Running nagios also on the RB-pi2 is to heavy i guess. 

Let's monitor and see what the alarms are providing. 

Thanks for your help


Groeten Rob de Hoog
Verstuurd vanaf mijn iPhone

> Op 3 jul. 2015 om 22:04 heeft Gregory Sloop <gregs at sloop.net> het volgende geschreven:
> 
> Putting it back on-list. [you emailed me directly - no offense taken - but it's better to be on-list...]
> 
> Ok...
> 
> Heavy load on a pipe, at least in my experience, doesn't cause much packet loss. It will however increase latency. So, I think a test like >10%,*5*,>10%,*5*,>10%,*5* [hopefully there no syntax errors there...]
> [meaning any three losses of >10% over 3-30m would trigger things.]
> 
> I use: 
> >15%, *3* ,>15%, *3* ,>15%
> >30%, *10* ,>30%, *10*, >30%, *10*, >30%
> >50%, *10* ,>50%, *10*, >50%, *10*, >50%
> ...for a first, second and third level trigger.
> 
> But - I only use the triggers to generate an MTR - the MTR comes in very handy in arguments with providers [Hello Comcast] when they claim the problem must be somewhere else other than their network. (Though to be fair, the tendency to blame someone else is a *very* strong one in most help-desk/support situations. And it so pisses me off!) The MTR script runs an MTR trace of that path, and emails me the result.
> 
> I do all my *alerts* with Nagios - using the smokeping plugin.
> In those cases, I use something like a warning for >10% loss or more for 5m, and critical with >40$ for 5m. [Nagios doesn't use an elaborate trigger system like Smokeping. But I don't get many false-positives with either setup. YMMV.]
> 
> Using Nagios allows me to more carefully manage alerts and when/how/where they're delivered - which makes life a lot easier. For example - no need to buzz me about a non critical backup link at 2a. But when the smelly stuff does hit the fan, I'll get the alerts I need. So, I abandoned using Smokeping for alerts quite a number of years ago.
> 
> I could dig up specifics if you really need them, but that's what I recall off the top of my head.
> 
> Cheers,
> Greg
> 
> 
> Hi Greg 
> 
> thanks for your email, agree on make it simpeler.
> 
> The goal i want is to monitor a serious problem asap, but prevent from false positives generated by people congesting the line with huge downloads.
> 
> How would you solve this, do you monitor such and would you be able to share the configuration?
> 
> thanks Rob
> 
> 
> Op 3 jul. 2015, om 18:08 heeft Gregory Sloop <gregs at sloop.net> het volgende geschreven:
> 
> Re: [smokeping-users] pocketless alarm question, sending to soon the alarm emails. 
> A few thoughts - though not exactly an answer to your question:
> 
> 1) I'm often too dense to grok out why more elaborate triggers don't work the way I want, and your falls into that category. But this seems, IMO, to be a very common problem. So, my solution: Make them simpler. Way simpler. [It's kind of like fancy regexs - I think I'm *so* clever and pat myself on the back. But then I actually run that regex against more real-world data, and well, it doesn't end pretty... So, I usually go back to - simpler is better, unless it's impossible to solve otherwise.] 
> 
> 2) In your case - do you really want to wait 75 minutes to find out a connection is completely down? [Or perhaps you have another trigger that does that.] But even if you do - this is just my opinion - but loss greater than 10% over more than 10-15 minutes is a sign of a _serious_ problem. So, I have simple triggers that let me know if I have even modest loss over fairly short periods of time. Yes, that can increase the number of alerts you see - but if you've got a connection with that many problems, you need to address the underlying problem, not just chirp at you about it less often.
> 
> 3) Yes, at first glance, your pattern appears right. However, I think the *25* means *up to* 25 samples. [0-25]
> So, a 10% loss, followed by another 10% loss the very next sample will match a pattern of: >10%,*25*,>10% or it will also match 
> A 10% loss, followed by another 10% loss with 1-25 samples between them with <10% loss.
> 
> So your example will also trigger with 4 subsequent samples of >10% loss each [i.e. over 4 sample periods]. It would also trigger in the conditions you envision - (1) >10% loss sample, and then another > 10% loss, 25 samples later, and another 25 samples later etc... 
> 
> Again, I think less fancy is more likely to produce a result that's still useful and a lot less trouble to test and verify it works in the conditions you envision.
> 
> HTH
> 
> -Greg
> 
> 
> RdH> Hi team
> 
> RdH> I hope you are doing great today?
> 
> RdH> what a great tool! Love to run it on my RBp2 and monitor the
> RdH> internet connection! I have a small question but i can’t solve it
> RdH> myself would you try to help me?
> 
> RdH> The goal is to have an alert when the internetline is
> RdH> experiencing packetloss, but for a longer time not on every
> RdH> glitch. Im using FPingnormal with a step of 180 (means 3 minutus) 
> 
> RdH> The loss pattern i defined is like :
> RdH> 10%,*25*,>10%,*25*,>10%,*25*,>10% which means based on the how
> RdH> to’s provided on the website: Take 25 samples (which is 25*3
> RdH> minutes) so if the packetloss exists from start -> 75 minutes
> RdH> later still>10% -> 75 minutes later still >10% and another 75
> RdH> minutes later still >10% send out an email.
> 
> RdH> But it send out an email almost directly, please see the
> RdH> screenshots where the packets loss starts and when i received the
> RdH> email, that timeframe is not even close to 75 minutes but more like a few minutes.
> 
> RdH> Could you advise how to make the packetloss alarm more reliable
> RdH> where it last for at least 1 hour before sending out the email?
> 
> RdH> Many thanks! Cheers Rob
> 
> RdH> how the system works now (screenshots where to big)
> 
> RdH> -packetloss started about 07:40 stable around 07:55 and started again 08:03
> RdH> -email alarm received around  08:07 after the second block of packets loss
> RdH> -email cleared received  after 3 minutes around 08:10
> 
> RdH> _______________________________________________
> RdH> smokeping-users mailing list
> RdH> smokeping-users at lists.oetiker.ch
> RdH> https://lists.oetiker.ch/cgi-bin/listinfo/smokeping-users
> 
> -- 
> Gregory Sloop, Principal: Sloop Network & Computer Consulting
> Voice: 503.251.0452 x82
> EMail: gregs at sloop.net
> http://www.sloop.net
> ---
> _______________________________________________
> smokeping-users mailing list
> smokeping-users at lists.oetiker.ch
> https://lists.oetiker.ch/cgi-bin/listinfo/smokeping-users
> 
> 
> -- 
> Gregory Sloop, Principal: Sloop Network & Computer Consulting
> Voice: 503.251.0452 x82
> EMail: gregs at sloop.net
> http://www.sloop.net
> ---
> _______________________________________________
> smokeping-users mailing list
> smokeping-users at lists.oetiker.ch
> https://lists.oetiker.ch/cgi-bin/listinfo/smokeping-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.oetiker.ch/pipermail/smokeping-users/attachments/20150704/dbce2988/attachment-0001.html>


More information about the smokeping-users mailing list