[smokeping-users] Alerting when a Slave stops sending data
Bill Houle
bhoule at siliconexus.com
Mon Apr 23 23:35:29 CEST 2018
Well, you could run monit on the master and it would do the just-alerting you wanted. I just thought on the slaves might be better so you have some actual control of the processes and/or visibility into general resources of the host. But hey, it’s Linux; multiple ways to skin that cat.
PS: monit would be free, but (if you implemented on the slaves) you could also throw some $$ at the M/Monit tool which could run on the master and give you a “single pane of glass” view into the entire monit+smokeping master-and-slaves ecosystem...
—bill
> On Apr 23, 2018, at 9:39 AM, Gregory Sloop <gregs at sloop.net> wrote:
>
> Monit won't help if the slave went down because someone unplugged it, or some other disaster befell it. It also won't help if the process is still running, but not actually pushing data to the master.
>
> However, it does have the benefit of being easy to install and configure, with no development/debug time required.
>
> For the use cases I've got, I think something running on the master would be more likely to be helpful more of the time. [I can't recall a single case where the slave was still up and functional, where Monit would do anything, yet the smokeping slave process was borked. But that may just be me.]
>
> -Greg
>
>
> I second the monit suggestion. I have used it for exactly this purpose (watching/restarting slave threads) in the past.
>
> regards,
> Darren
>
> On 23 April 2018 at 05:47, Bill Houle <bhoule at siliconexus.com> wrote:
> As someone who recently had to implement a monitor of not-smokeping processes, might I suggest “monit”? It is a fairly mainstream package that is readily available in yum and apt-get repos.
>
> Monit is a locally-installed (ie per slave) daemon process that can monitor files (by timestamp or checksum), processes (by PID), programs (by exit code), and system (by resource consumption). It has a flexible config language that can alert/start/stop/exec based on those monitor conditions.
>
> I could see monit being used to watch each slave and alert and/or auto-restart the data collection.
>
> —bill
>
>
>
> On Apr 22, 2018, at 11:29 AM, Gregory Sloop <gregs at sloop.net> wrote:
>
> This is an awesome idea - and one I've wished for in the past - but never got around to working on.
> Checking the slave data files modification times seems plausible as a way to check updates - but you'd have to test to be sure. [IIRC that will work though.]
>
> Personally, I'd probably try to write it in bash - or something completely external to smokeping. [Bash because of few dependicies - though you'll probably want/need something like sendemail for email notifications...
>
> If slaves are behind NAT or something similar, you'll have to have a way to get to the slave for handling a restart, but that's really outside the scope of what you're doing.
>
> Honestly, simply getting notification that a slave is not pushing updates would be more than enough - even without the restart.
>
> Sounds fab to me. And I can't think of a better way, off hand.
>
> -Greg
>
>
> Hello,
>
> I have a Debian Jessie box with Smokeping 2.6 installed on it.
>
> It receives data from Slaves over the Internet (10 slaves or so).
> Each Slave roughly monitors xDSL or fiber links.
>
> Every monday, I can see that data from one or two slaves is missing.
> Then I remotely restart smokeping service on slave where data is missing.
>
> I would like to implement something like:
>
> - if no data at all from Slave for a given period of time, then restart Slave's smokeping service and send a Notice email
>
> - if no data at all from Slave for a longer period of time and Slave's restart already attempted, then send a Warning email
>
> As Slaves data is stored on a known directory ins Master's filesystem, I think I can detect when data from a slave has not been lately modified, reading directories of files modification times.
>
> Is there a better way to do so ? Alert's settings seem more appropriate when WAN links in my case, are slower.
>
> Best regards
>
>
>
> --
> Gregory Sloop, Principal: Sloop Network & Computer Consulting
> Voice: 503.251.0452 x82
> EMail: gregs at sloop.net
> http://www.sloop.net
> ---
> _______________________________________________
> smokeping-users mailing list
> smokeping-users at lists.oetiker.ch
> https://lists.oetiker.ch/cgi-bin/listinfo/smokeping-users
>
> _______________________________________________
> smokeping-users mailing list
> smokeping-users at lists.oetiker.ch
> https://lists.oetiker.ch/cgi-bin/listinfo/smokeping-users
>
>
> --
> Gregory Sloop, Principal: Sloop Network & Computer Consulting
> Voice: 503.251.0452 x82
> EMail: gregs at sloop.net
> http://www.sloop.net
> ---
> _______________________________________________
> smokeping-users mailing list
> smokeping-users at lists.oetiker.ch
> https://lists.oetiker.ch/cgi-bin/listinfo/smokeping-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.oetiker.ch/pipermail/smokeping-users/attachments/20180423/cd45095e/attachment-0001.html>
More information about the smokeping-users
mailing list