[mrtg] Re: Periodic hangs on snmp requests

Tue Jan 25 18:08:52 MET 2005

I should also add that in scanning the list archives, I see that Chris
Conn reported something that looks somewhat similar on Dec 31 '04:

--------------------snip----------------------
Hello,

For some reason since I have upgraded to a newer RAID firmware on my 
SCSI controller, some mrtg processes hang indefinately and need to be 
killed manually.  The rest of the server seems fine.  Other than 
claiming there is a stale lock file, the next polls continue without 
problem, and when I kill the stale process I get an email with

ERROR: Bailout after SIG TERM

While I investigate this phenomenon, is there a way to set the maximum 
execution time of either the mrtg process or the perl execution?

Thanks in advance,

Chris
--------------------snip----------------------

On Tue, Jan 25, 2005 at 10:44:22AM -0600, Larry Fahnoe wrote:
> Hello,
> 
> >From time to time I'm finding that mrtg will hang on snmp requests.
> The processes will never die (until I manually kill them) and if the
> processes are not killed, they will eventually collect to the point
> that virtual memory is exhausted.  This is happening on Red Hat
> Enterprise Linux release 3 which is kept current with patches from Red
> Hat.  mrtg is 2.11.0, rrdtool is 1.0.49, and perl is 5.8.0.
> 
> I've been seeing this problem almost exclusively with a bunch of
> Nortel and Cisco switches, the routers do not cause the problem.  I
> have not (yet) isolated down to a particular switch, but I don't think
> it is just one switch that is causing the problem.  What I typically
> see is three mrtg processes in a group that are hung.  Here is a
> recent example, using strace to see what the parent, child, and
> grandchild processes are doing:
> 
> # strace -v -p 14914   [parent process]
> wait4(-1,  <unfinished ...>
> 
> # strace -v -p 14916   [child process]
> select(16, [4], NULL, [4], NULL <unfinished ...>
> 
> # strace -v -p 15051   [grandchild process]
> recvfrom(4,  <unfinished ...>
> 
> # netstat -anp | grep 15051
> udp        0      0 0.0.0.0:40692           0.0.0.0:*
> 15051/perl
> 
> Upon killing the grandchild, I get the following in the log:
> 
> ERROR: Bailout after SIG TERM
> ERROR: fork 0 has died ahead of time ...
> Command exited with non-zero status 29
> 16.80user 0.65system 21:55:23elapsed 0%CPU (0avgtext+0avgdata
> 0maxresident)k
> 0inputs+0outputs (719major+35652minor)pagefaults 0swaps
> 
> I have been seeing this off and on for several months with different
> versions of mrtg and perl.  My thought is that the snmp request
> timeouts are not being honored but beyond that I'm stumped.  Any
> insight into what might be happening here?
> 
> 

-- 
Larry Fahnoe, Fahnoe Technology Consulting, fahnoe at FahnoeTech.com
952/925-0744      Minneapolis, Minnesota       www.FahnoeTech.com 

--
Unsubscribe mailto:mrtg-request at list.ee.ethz.ch?subject=unsubscribe
Archive     http://www.ee.ethz.ch/~slist/mrtg
FAQ         http://faq.mrtg.org    Homepage     http://www.mrtg.org
WebAdmin    http://www.ee.ethz.ch/~slist/lsg2.cgi