[smokeping-users] Severe lag when restarting Smokeping and webpage timing out.

Brett Bronson brett.bronson at bigblockla.com
Wed Mar 5 02:59:53 CET 2014


Hey Greg,

I took your advice for disabling apparmour which still doesn't help. This
box is definitely not doing anything I/O intensive. It's also a relatively
powerful box (12 core, 24gb ram). It's basically a cron server and web
server for a local intranet occupied by maybe 20 people at the moment but
will be used for more cpu intensive stuff later down the line. I tested top
when loading the page and the footprint was minimal.

I didn't realize I had slaves entered in the slave section, I was just
using the template config, so I removed those. No slaves are configured for
this.

I debugged this further and found that when I upped the default timeout and
found that once apache2 was restarted, it would increasingly take more time
to show the page initially as I increased the number of Targets, however,
once the page was loaded, it would refresh almost instantly. I also
upgraded my local install of RRDTool, however, this doesn't seem to help
the initial load time.




On Tue, Mar 4, 2014 at 4:51 PM, Gregory Sloop <gregs at sloop.net> wrote:

>  [Tue Mar 04 15:15:36 2014] [warn] [client 192.168.1.66] mod_fcgid: read
> data timeout in 40 seconds, referer: http://pipeline/
>
> Looks like the smokeping cgi times out reading data.
>        Is this box I/O bound?
>        What does top show when you try to get a web-page from SP? [load
> averages in particular]
>
> In any case, you need to figure out why the CGI is failing to read the
> data in the allowed time of 40 secs.
> Changing the default time-out might help if the box is I/O bound, but not
> totally buried. [And I'm not sure where that might be.]
>
> However, if the box is seriously overloaded I/O wise, then waiting longer
> won't really solve your problem - it will just push the box further below
> the water.
> [And this all gets back to - how many RRD's and how big are they. See the
> database section. Are there slaves? If so, how many?]
>
> Finally:
>
> >Is fping being ran as soon as the cgi script is executed from the
> webserver?
>
> You appear to misunderstand how SP works. The daemon runs fping and logs
> the results and writes to the RRD's. The CGI pulls data from the RRD and
> generates graphs for the http output.
>
> It appears from the debug log from SP that writing the data went fine. [At
> least for the small subset of targets.]
> However reading the RRD's and generating the graphs appears to
> fail/timeout when reading the RRD's. [Or reading something - in any case.]
>
> Is selinux or apparmour running? If so, then stop them or run in
> permissive mode and see if that helps.
>
>
> -Greg
>
>
>  Forgot to add the smoke.log:
>
> http://pastebin.com/20UbvJVx
>
> At the bottom of the log you can see that I also tried timing fping (the
> same command that smokeping was running) and it looks like it took 19.3
> seconds to run for a small number of machines. Would that cause it to time
> out? Is fping being ran as soon as the cgi script is executed from the
> webserver?
>
>
>
> On Tue, Mar 4, 2014 at 4:10 PM, Brett Bronson <
> brett.bronson at bigblockla.com> wrote:
> Here is the apache error log that is listing smokeping:
> http://pastebin.com/Knm1Cmw1
>
> As for debug mode, here's my output:
> http://pastebin.com/8txnhnkv
>
> The host names do resolve; here's an example:
> [04:07 PM]superuser at pipeline[/opt/smokeping/bin] > time fping larender001a
> larender001a is alive
>
> real    0m0.014s
> user    0m0.000s
> sys     0m0.000s
>
>
>
> On Tue, Mar 4, 2014 at 3:32 PM, Brett Bronson <
> brett.bronson at bigblockla.com> wrote:
> Also, it looks like the version I have running is actually the latest, I
> assumed it would output the version as 2.6.9. Sorry
>
>
> On Tue, Mar 4, 2014 at 3:29 PM, Brett Bronson <
> brett.bronson at bigblockla.com> wrote:
> Okay, it looks like I was actually using an older version of smokeping.
> I've removed it and installed the latest version on the site and my config
> is as follows:
> http://pastebin.com/ZsLE8uCp
>
> Before, I was able to get smokeping to work fine up until I added the
> section:
>
> + nodes
> menu = Render Node Latency
> title = Render Node Latency (ICMP Pings)
>
> ++ larender001a
> host = larender001a
> ++ larender001b
> host = larender001b
> ++ larender001c
> host = larender001c
> ++ larender001d
> host = larender001d
>
> ++ larender002a
> host = larender002a
> ++ larender002b
> host = larender002b
> ++ larender002c
> host = larender002c
> ++ larender002d
> host = larender002d
>
>
>
> Now that I look at the logs, it looks like it's still using the old
> version....
> [ ... ]
> Tue Mar  4 15:03:05 2014 - FPing: probing 5 targets with step 300 s and
> offset 116 s.
> Tue Mar  4 15:16:01 2014 - Smokeping version 2.006009 successfully
> launched.
> Tue Mar  4 15:16:01 2014 - Not entering multiprocess mode for just a
> single probe.
> Tue Mar  4 15:16:01 2014 - FPing: probing 13 targets with step 300 s and
> offset 163 s.
> Tue Mar  4 15:25:59 2014 - Smokeping version 2.006009 successfully
> launched.
> Tue Mar  4 15:25:59 2014 - Not entering multiprocess mode for just a
> single probe.
> Tue Mar  4 15:25:59 2014 - FPing: probing 13 targets with step 300 s and
> offset 159 s.
>
> Before, I used sudo apt-get install smokeping to install, but I later
> removed it using sudo apt-get remove smokeping; however, it looks like it
> didn't remove the old version? Any idea how I could resolve this so that it
> loads up the newer version?
>
>
>
>
>
> On Tue, Mar 4, 2014 at 2:28 PM, Gregory Sloop <gregs at sloop.net> wrote:
> I don't see a database section, so I assume it's somewhere else. [Nothing
> looks obviously wrong - but that was just a quick glance.]
>
> But when you first start SP after adding a bunch of targets, it's going to
> have to allocate/create the RRD for each of the targets.
> [Also, are there slaves, because it will create X * 60 new RRD's - where X
> is how many slave SP instances you have. (In addition to the master RRD's) ]
>
> I wouldn't think that would take 10m, but I can't see how much data you're
> stuffing in each RRD, or if you have slaves, which might help explain it.
>
> As to why web-pages won't work, I'm not sure. Have you looked at the
> apache logs to see what they say? Or run SP in debug mode? [smokeping
> --debug
> IIRC]
>
> -Greg
>
>
>  Hello,
>
> I recently updated my smokeping Target configuration to include about 60
> of our machines in our render farm and noticed that restarting the
> smokeping service took about 10 minutes, and now our webpage will not load.
>
> Any ideas?
>
> My config:
> http://pastebin.com/ibNmGhAF
>
>
> --
> Brett Bronson
> Big Block | Pipeline TD
> http://www.bigblockla.com
> [m] 805-338-6520
>
>
>
>
>
> --
> Brett Bronson
> Big Block | Pipeline TD
> http://www.bigblockla.com
> [m] 805-338-6520
>
>
>
>
> --
> Brett Bronson
> Big Block | Pipeline TD
> http://www.bigblockla.com
> [m] 805-338-6520
>
>
>
>
> --
> Brett Bronson
> Big Block | Pipeline TD
> http://www.bigblockla.com
> [m] 805-338-6520
>
>
>
>
> --
> Brett Bronson
> Big Block | Pipeline TD
> http://www.bigblockla.com
> [m] 805-338-6520
>
>
>
>
>
> *--  Gregory Sloop, Principal: Sloop Network & Computer Consulting Voice:
> 503.251.0452 x82 <503.251.0452%20x82> EMail: *gregs at sloop.net
> http://www.sloop.net
> *---*
>



-- 
Brett Bronson
Big Block | Pipeline TD
http://www.bigblockla.com
[m] 805-338-6520
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.oetiker.ch/pipermail/smokeping-users/attachments/20140304/95925ff2/attachment-0001.htm 


More information about the smokeping-users mailing list