[mrtg] Re: Periodic hangs on snmp requests
Larry Fahnoe
fahnoe at FahnoeTech.com
Tue Jan 25 18:08:52 MET 2005
I should also add that in scanning the list archives, I see that Chris
Conn reported something that looks somewhat similar on Dec 31 '04:
--------------------snip----------------------
Hello,
For some reason since I have upgraded to a newer RAID firmware on my
SCSI controller, some mrtg processes hang indefinately and need to be
killed manually. The rest of the server seems fine. Other than
claiming there is a stale lock file, the next polls continue without
problem, and when I kill the stale process I get an email with
ERROR: Bailout after SIG TERM
While I investigate this phenomenon, is there a way to set the maximum
execution time of either the mrtg process or the perl execution?
Thanks in advance,
Chris
--------------------snip----------------------
On Tue, Jan 25, 2005 at 10:44:22AM -0600, Larry Fahnoe wrote:
> Hello,
>
> >From time to time I'm finding that mrtg will hang on snmp requests.
> The processes will never die (until I manually kill them) and if the
> processes are not killed, they will eventually collect to the point
> that virtual memory is exhausted. This is happening on Red Hat
> Enterprise Linux release 3 which is kept current with patches from Red
> Hat. mrtg is 2.11.0, rrdtool is 1.0.49, and perl is 5.8.0.
>
> I've been seeing this problem almost exclusively with a bunch of
> Nortel and Cisco switches, the routers do not cause the problem. I
> have not (yet) isolated down to a particular switch, but I don't think
> it is just one switch that is causing the problem. What I typically
> see is three mrtg processes in a group that are hung. Here is a
> recent example, using strace to see what the parent, child, and
> grandchild processes are doing:
>
> # strace -v -p 14914 [parent process]
> wait4(-1, <unfinished ...>
>
> # strace -v -p 14916 [child process]
> select(16, [4], NULL, [4], NULL <unfinished ...>
>
> # strace -v -p 15051 [grandchild process]
> recvfrom(4, <unfinished ...>
>
> # netstat -anp | grep 15051
> udp 0 0 0.0.0.0:40692 0.0.0.0:*
> 15051/perl
>
> Upon killing the grandchild, I get the following in the log:
>
> ERROR: Bailout after SIG TERM
> ERROR: fork 0 has died ahead of time ...
> Command exited with non-zero status 29
> 16.80user 0.65system 21:55:23elapsed 0%CPU (0avgtext+0avgdata
> 0maxresident)k
> 0inputs+0outputs (719major+35652minor)pagefaults 0swaps
>
> I have been seeing this off and on for several months with different
> versions of mrtg and perl. My thought is that the snmp request
> timeouts are not being honored but beyond that I'm stumped. Any
> insight into what might be happening here?
>
>
--
Larry Fahnoe, Fahnoe Technology Consulting, fahnoe at FahnoeTech.com
952/925-0744 Minneapolis, Minnesota www.FahnoeTech.com
--
Unsubscribe mailto:mrtg-request at list.ee.ethz.ch?subject=unsubscribe
Archive http://www.ee.ethz.ch/~slist/mrtg
FAQ http://faq.mrtg.org Homepage http://www.mrtg.org
WebAdmin http://www.ee.ethz.ch/~slist/lsg2.cgi
More information about the mrtg
mailing list