[smokeping-users] Alert script, with MTR

Gregory Sloop gregs at sloop.net
Fri Jun 27 18:00:37 CEST 2014

So, I've solved my problem, and it was like I envisioned. [The alert script/MTR seems to run in the foreground and everything stops in smokeping until the script finishes, thus causing the RRD's to stop getting updated etc.]

So, now, I call a bash script from smokeping like so:

**Contents of call-smoke-mtr
/etc/smokeping/smoke-mtr "$@" &

The $@ is short-hand for "output all the command line args."
So we just run the real script with all the same input args we got called with, and do it in the background...

The smoke-mtr is the "real" python script that actually does the MTR, generates the report and sends it.

And these all go into the background and run just fine.

A warning - if you kicked off some large number [I have no idea what large number might be a problem - I'd guess perhaps 100+] there's no control in my setup to keep us from consuming all the available resources and killing the smokeping monitor.

Perhaps a few lines to keep the number of background processes from exceeding some threshold would be good - but I've not tinkered with that.

But the above works, and prevents the choke in smokeping writing the RRD's while the MTR is running and thus no gaps in your RRD's, nogios complaining bitterly etc. :)


So, I've changed the thread title...

A few updates.

I didn't think it was load, so I tried running the Alert/MTR script *by hand/manually*, while smokeping and nagios are doing their thing - just to test what load was and what the effect was.

I ran about 15 alerts/MTR runs in quick succession - all while smokeping and nagios were also running doing their work.
Load does peak higher than I suspected - at ~2 for the 1 min average - but those queries complete fairly quickly and load drops back to around 0.3-0.4. [and this is way more load from the MTR script than should have been occurring in the automated runs I was doing before.]

However, even with the much higher load, there were no drops in writing the smokeping RRD's and Nagios doesn't complain about them.

So, I think it's safe to say that it's not a load issue - it's that for some reason when smokeping runs the "alert" script, that it has to wait for that script to finish before it goes on to do anything more - and this causes the other issues.

So, I also tried appending a "&" to the smokeping alert line in the config - in the hopes that it would run the process in the background. No luck. [I'd guess it places the "&" before the passed arguments and the script doesn't get any of the passed arguments it needs.]

I thought about creating a script that would run a second script and append the "&" to it, and run it.

"MTR-Create" [a (bash?) script] - would take the arguments it was passed from smokeping [you'd call MTR-Create from the smokeping alert]
MTR-Create would simply take it's arguments and call the "regular" MTR/Alert, passing along those arguments and appending "&" at the end to run it in the background.

I suspect I can struggle my way through doing that - but does any BASH guru know how best to do that, offhand. It could save me a lot of poking, trial and error! :)




How many alerts are firing when your box starts to bog?  I have been running my fork of the mtr script for several months now with no issues.  Matter of fact, I am now working on an expanded version that will dump the mtr's into mysql for easy access for our NOC.  Currently, I just have the script appending a file in /var/log with each mtr.

Could you be pushing the box you are running from too hard?

On Wed, Jun 25, 2014 at 8:12 PM, Gregory Sloop <gregs at sloop.net> wrote:

FP> On 21.02.2014 06:42, Philip Wehunt wrote:
>> I could hackishly work around this in my python but I wanted to
>> identify if I am doing something wrong on the SP side or if it is a
>> bug. Mainly in the spirit of KISS. I don't like to let hackish
>> scripts linger.

FP> You probably found the same script on gist, but here's my version[1]
FP> which doesn't fail when the 6th arg is missing. It will not add "
FP> cleared" to the subject without the arg, but it will send you the report.

FP> [1]: https://git.server-speed.net/users/flo/bin/tree/smokemtr.py

FP> From the documentation in smokeping_config I'd say this is a bug, but
FP> given I get my mails I didn't bother fixing it yet.

Florian et.al.

First, thanks for the script. I've had to mod it a bit - my MTR isn't quite the same as yours and I want to use a non-local SMTP server and port - but those were easy mods. [MTR is in a different spot too, again easy mod.]

So, I'm very excited about the prospects of automated mtr stats when a smokeping alert gets triggered - however I run into a substantial snag.

I use a 60s poll in smokeping, and if I get a bunch of [smokeping] alerts that kick off, then, when each MTR takes a while to run, it stalls smokeping. 

This causes a ripple-effect, and a raft of nagios alerts...since I use a smokeping nagios plug-in.  When SP stalls [running the mtr's] the RRD's go dry, and then nagios starts alerting on an "unknown" target state. ["This RRD hasn't been written to in 180s" etc.]

So, is there some way I can fork off the mtr script, and allow smokeping to continue while the mtr stats are gathered and a report sent?

[This is something I'm woefully un-knowledgeable about...]


smokeping-users mailing list
smokeping-users at lists.oetiker.ch

Gregory Sloop, Principal: Sloop Network & Computer Consulting
Voice: 503.251.0452 x82
EMail: gregs at sloop.net

Gregory Sloop, Principal: Sloop Network & Computer Consulting
Voice: 503.251.0452 x82
EMail: gregs at sloop.net
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.oetiker.ch/pipermail/smokeping-users/attachments/20140627/30fa2291/attachment-0001.htm 

More information about the smokeping-users mailing list