[mrtg-developers] Re: [mrtg] Periodic hangs on snmp requests
    Larry Fahnoe 
    fahnoe at FahnoeTech.com
       
    Tue Jan 25 18:08:52 MET 2005
    
    
  
I should also add that in scanning the list archives, I see that Chris
Conn reported something that looks somewhat similar on Dec 31 '04:
--------------------snip----------------------
Hello,
For some reason since I have upgraded to a newer RAID firmware on my 
SCSI controller, some mrtg processes hang indefinately and need to be 
killed manually.  The rest of the server seems fine.  Other than 
claiming there is a stale lock file, the next polls continue without 
problem, and when I kill the stale process I get an email with
ERROR: Bailout after SIG TERM
While I investigate this phenomenon, is there a way to set the maximum 
execution time of either the mrtg process or the perl execution?
Thanks in advance,
Chris
--------------------snip----------------------
On Tue, Jan 25, 2005 at 10:44:22AM -0600, Larry Fahnoe wrote:
> Hello,
> 
> >From time to time I'm finding that mrtg will hang on snmp requests.
> The processes will never die (until I manually kill them) and if the
> processes are not killed, they will eventually collect to the point
> that virtual memory is exhausted.  This is happening on Red Hat
> Enterprise Linux release 3 which is kept current with patches from Red
> Hat.  mrtg is 2.11.0, rrdtool is 1.0.49, and perl is 5.8.0.
> 
> I've been seeing this problem almost exclusively with a bunch of
> Nortel and Cisco switches, the routers do not cause the problem.  I
> have not (yet) isolated down to a particular switch, but I don't think
> it is just one switch that is causing the problem.  What I typically
> see is three mrtg processes in a group that are hung.  Here is a
> recent example, using strace to see what the parent, child, and
> grandchild processes are doing:
> 
> # strace -v -p 14914   [parent process]
> wait4(-1,  <unfinished ...>
> 
> # strace -v -p 14916   [child process]
> select(16, [4], NULL, [4], NULL <unfinished ...>
> 
> # strace -v -p 15051   [grandchild process]
> recvfrom(4,  <unfinished ...>
> 
> # netstat -anp | grep 15051
> udp        0      0 0.0.0.0:40692           0.0.0.0:*
> 15051/perl
> 
> Upon killing the grandchild, I get the following in the log:
> 
> ERROR: Bailout after SIG TERM
> ERROR: fork 0 has died ahead of time ...
> Command exited with non-zero status 29
> 16.80user 0.65system 21:55:23elapsed 0%CPU (0avgtext+0avgdata
> 0maxresident)k
> 0inputs+0outputs (719major+35652minor)pagefaults 0swaps
> 
> I have been seeing this off and on for several months with different
> versions of mrtg and perl.  My thought is that the snmp request
> timeouts are not being honored but beyond that I'm stumped.  Any
> insight into what might be happening here?
> 
> 
-- 
Larry Fahnoe, Fahnoe Technology Consulting, fahnoe at FahnoeTech.com
952/925-0744      Minneapolis, Minnesota       www.FahnoeTech.com 
--
Unsubscribe mailto:mrtg-developers-request at list.ee.ethz.ch?subject=unsubscribe
Help        mailto:mrtg-developers-request at list.ee.ethz.ch?subject=help
Archive     http://www.ee.ethz.ch/~slist/mrtg-developers
    
    
More information about the mrtg-developers
mailing list