[smokeping-users] 500 Error on Master crashes Slaves?
fligor at illinois.edu
Tue Jul 17 16:18:33 CEST 2012
It's not likely the 500 error, it's probably the HUP not finding the PID correctly that was the problem.
I have a very similar problem a lot right now. I just took over running smokeping when a co-worker left, so I was starting from no experience other than looking at the graphs. the first thing I had to do was move from an RHEL 4 server to an RHEL 6 server. I've another post to make asking for help tuning fcgi to work, as I had to go from a perperl version (2.4.3) to an fcgi version (2.6.8) and it's not too stable (the system load is either .5 or 150, it doesn't have much in between).
The last person running smokeping hadn't done some config file updates, and the load on the system was very high, so he'd removed some slaves. as I put them all back a little at a time I would update the config file, and the slaves would all get this error and stop sending updates. what I traced it down to is that the slaves are not running in daemon mode, but the software only writes out a PID file in daemon mode. since my startup up script removes old PID files, I get a pid file missing error when the HUP happens. You might check and be sure your PID file on the slaves is correct for the current process, not an old one that didn't get deleted in the past.
I've looked through the docs and tried to search the mailing list archives, but haven't figured this one out yet. So can anyone tell me if slaves have to be run "--nodaemon" for some reason? it would be handy to have the PID file there, things would be a lot nicer since it tries to do the HUP in nodaemon mode, even though there is no PID file.
On Jul 17, 2012, at 5:22, Jason Yates wrote:
> There was a config error on my master smokeping box (as a result of somebody trying to add a menu option without specifying a menu or title param), the error itself was resolved within minutes however during the “outage” the slaves must have polled and as a result logged the following (logs are in reverse time order).
> Mon Jul 16 15:30:52 2012 - ERROR: no instance of SmokePing running (pid 30738)?
> Mon Jul 16 15:30:52 2012 - server has new config for me ... HUPing the parent
> Mon Jul 16 15:30:52 2012 - Sent data to Server and got new config in response.
> <FPING is still running here. Logs cut.>
> Mon Jul 16 15:30:38 2012 - ERROR: we did not get config from the master. Maybe we are not configured as a slave for any of the tar
> gets on the master ?
> Mon Jul 16 15:30:38 2012 - WARNING Master said 500 Internal Server Error
> Mon Jul 16 15:30:34 2012 - Got HUP signal.
> Mon Jul 16 15:30:34 2012 - server has new config for me ... HUPing the parent
> Mon Jul 16 15:30:34 2012 - Sent data to Server and got new config in response.
> Is it possible to stop the slaves from killing smokeping on the first 500 error? Or have it automatically restart? I’d prefer not to have to go around and restart the slave processes each time a config error is made.
> Thanks all.
> Jason Yates
> Network Engineer
> Office: +44 208 834 8493
> Mobile: +44 7590 534249
> IS Networks : +44 208 834 8573
> Betfair. The World’s Biggest Betting Community.
> Please consider the environment before printing this e-mail.
> Betfair Limited | Winslow Road | Hammersmith Embankment | London | W6 9HP. Registered in England and Wales under company number 5140986.
> This email (which includes any attachment and any subsequent reply) is sent for and on behalf of one or more operating entities in the Betfair Group, details of which are available here. The information in this e-mail is confidential and may contain legal advice that is subject to legal privilege. As such it is intended only for the named recipient(s). This e-mail may not be disclosed or used by any person other than the addressee, nor may it be copied in any way. If you are not a named recipient please notify the sender immediately and delete any copies of this email. Any unauthorised copying, disclosure or distribution of the material in this e-mail is strictly forbidden. Any view or opinions presented are solely those of the author and do not necessarily represent those of the Betfair Group. Betfair® and the BETFAIR LOGO are registered trade marks of The Sporting Exchange Limited.
> In order to protect our email recipients, Betfair Group use SkyScan from
> MessageLabs to scan all Incoming and Outgoing mail for viruses.
> smokeping-users mailing list
> smokeping-users at lists.oetiker.ch
Debbie Fligor, n9dn Lead Network Engineer for CITES @ Univ. of Il
email: fligor at illinois.edu <http://www.uiuc.edu/ph/www/fligor>
"Every keystroke can be monitored. And the computers never forget."
More information about the smokeping-users