[smokeping-users] Gaps in Graphs
scmoseman at gmail.com
Mon Nov 12 22:37:28 CET 2007
Just as I started typing a reply, I saw the following message display
in my syslog...
WARNING: smokeping took 301 seconds to complete 1 round of polling. It
should complete polling in 300 seconds. You may have unresponsive
devices in your setup.
I'm going to assume that Smokeping does not write ANY data to the RRDs
if it cannot complete the polling for EVERY device in the config? Is
that accurate? Other than seeing these alarms for when it fails, is
there any way to see how long its taking for the polls that succeed,
so I can see how we're doing?
On Nov 12, 2007 1:58 PM, Peter Kristolaitis <alter3d at alter3d.ca> wrote:
> Hi Scott;
> The first thing I would check would be to see if any new devices have been
> added to your SmokePing config around the time you started having problems.
> If so, check the SmokePing logs for warnings that look like "Warning:
> Polling took longer than the interval step." or something similar.
> What could be happening is that at some point, you had X devices monitored,
> and they took, for example, 298 seconds to scan. If you added another
> device, all of a sudden it might take 302 seconds to scan. If you had your
> scan cycles set to 5 minutes (300 seconds), this means that SmokePing can't
> complete a round of scanning before another one starts. This could
> definitely cause the problems you've been seeing.
> If this is the case, the solution is to either lenghten the scan cycle,
> remove some hosts, increase concurrency (although I don't think that's
> supported in 1.x?), or upgrade to the current SmokePing series and implement
> multiple monitors and/or master/slave.
> Scott Moseman wrote:
> We're running Smokeping 1.34. Yes, I'm aware its old, but it's been
> deployed forever and it's been working fine. Lately we've been having
> some weird issues with missed polls. I have attached a sample showing
> the last 3 hours and it includes 3 missed holes. This happens across
> every device in the Smokeping config. We have a script that runs
> every 5 minutes to update the config if there's new entries and part
> of that process is to verify the process is running (and restart, or
> start, if necessary). It logs what's going on. When the config
> updates and Smokeping restarts, there's never a gap. According to my
> scripts, and looking at the age of the Smokeping process that's
> running, these gaps were NOT caused by Smokeping having failed
> execution. Also, I setup a ping tool to monitor the switch, router
> and an external address every SECOND for awhile. There was never a
> lack of connectivity during these gaps in Smokeping. Is there any
> means to troubleshoot? I will enable the syslog function to see if it
> provides any details about what's going on.
More information about the smokeping-users