[mrtg] MRTG multiple processes
Matthew Petach
mpetach at yahoo.com
Fri Mar 12 10:15:54 CET 2010
----- Original Message ----
> From: Pavel Ruzicka <pavouk at pavouk.org>
> To: mrtg at lists.oetiker.ch
> Sent: Thu, March 11, 2010 6:31:29 AM
> Subject: [mrtg] MRTG multiple processes
>
> Hello,
I have troubles, when MRTG have unaccesible more boxes and doesn't
> end
in 5 minutes interval.
This is output prom ps:
root
> 3012 0.0 0.0 74840 1172 ?
> Ss Jan22 0:05 crond
>
>
>
>
mrtg 7664 0.0 0.0 102020 1536
> ? S 15:05 0:00 \_
> crond
>
>
>
mrtg 7668 1.0
> 3.0 325848 251420 ? Ss 15:05 0:11
> | \_
/usr/bin/perl -w /usr/local/mrtg/bin/mrtg
> /usr/local/mrtg/cfg/mrtg.cfg
mrtg
> 7687 0.0 2.5 292360 210988 ? S
> 15:05 0:00 | | \_
/usr/bin/perl -w
> /usr/local/mrtg/bin/mrtg /usr/local/mrtg/cfg/mrtg.cfg
mrtg
> 7689 0.0 0.0 57684 3600 ?
> S 15:05 0:00 | \_
/usr/sbin/sendmail
> -FCronDaemon -i -odi -oem -oi -t
>
mrtg 7709 0.0
> 0.0 102020 1536 ? S 15:10
> 0:00 \_ crond
mrtg 7713 1.4 3.0 325848
> 251416 ? Ss 15:10 0:11 | \_
>
/usr/bin/perl -w /usr/local/mrtg/bin/mrtg
> /usr/local/mrtg/cfg/mrtg.cfg
mrtg 7732 0.0
> 2.5 292360 210988 ? S 15:10 0:00
> | | \_
/usr/bin/perl -w /usr/local/mrtg/bin/mrtg
> /usr/local/mrtg/cfg/mrtg.cfg
mrtg 7734 0.0
> 0.0 57684 3596 ? S
> 15:10 0:00 | \_
/usr/sbin/sendmail -FCronDaemon -i -odi
> -oem -oi -t
mrtg 7740 0.0 0.0 102020
> 1536 ? S 15:15 0:00 \_
> crond
mrtg 7744 2.3 3.0 325848 251424 ?
> Ss 15:15 0:11 | \_
/usr/bin/perl
> -w /usr/local/mrtg/bin/mrtg /usr/local/mrtg/cfg/mrtg.cfg
mrtg
> 7763 0.0 2.5 292360 210992 ? S
> 15:15 0:00 | | \_
/usr/bin/perl -w
> /usr/local/mrtg/bin/mrtg /usr/local/mrtg/cfg/mrtg.cfg
mrtg
> 7765 0.0 0.0 57688 3596 ?
> S 15:15 0:00 | \_
/usr/sbin/sendmail
> -FCronDaemon -i -odi -oem -oi -t
mrtg 7769
> 0.0 0.0 102020 1536 ? S
> 15:20 0:00 \_ crond
mrtg 7773
> 6.5 3.0 325320 250856 ? Ss 15:20
> 0:11 \_
/usr/bin/perl -w /usr/local/mrtg/bin/mrtg
> /usr/local/mrtg/cfg/mrtg.cfg
mrtg 7792 0.2
> 2.5 292360 210988 ? S 15:20 0:00
> | \_
/usr/bin/perl -w /usr/local/mrtg/bin/mrtg
> /usr/local/mrtg/cfg/mrtg.cfg
mrtg 7794 0.0
> 0.0 57688 3600 ? S
> 15:20 0:00 \_
/usr/sbin/sendmail -FCronDaemon -i
> -odi -oem -oi -t
I run MRTG from cron:
*/5 * * * *
> /usr/local/mrtg/bin/mrtg /usr/local/mrtg/cfg/mrtg.cfg
I know, that MRTG
> check if there doesn't run concurrent process.
Do you have idea, why this
> doesn't work?
Or how can I solve it?
I have MRTG 2.16.2 and CENTOS5
> distribution of linux.
Best regards,
Pavel
> Ruzicka
Hi Pavel,
I found the lockfile handling in MRTG to be woefully inadequate, so
I wrote a wrapper for MRTG that does more robust device locking
and checking to make sure the system load on the box doesn't
go too high; then I just call the wrapper script in my crontab entries,
rather than the mrtg script itself, like this--I use a separate config
file for each device, and then have a separate line in the crontab file
for each device:
#
0-59/5 * * * * /home/mrtg/scripts/mrtg.wrapper /home/mrtg/cfg/pat1.dee.cfg
0-59/5 * * * * /home/mrtg/scripts/mrtg.wrapper /home/mrtg/cfg/pat2.dee.cfg
#
And here's my wrapper script--feel free to hack it up into something
that works for you. ^_^;
Matt
mpetach at tftp1:/home/mrtg> cat scripts/mrtg.wrapper
#!/usr/bin/perl
#
# Simple wrapper to prevent multiple MRTG instances from
# trying to access the same data directory simultaneously,
# which leads to data corruption.
#
# It simply writes the PID of itself to an mrtg.lock file
# in the data directory specified in the conf file given
# as the argument, runs mrtg with its normal command line,
# then removes the mrtg.lock file upon finishing.
#
# If it already finds an mrtg.lock file in the directory, it
# does a ps to see if the process is still running. If it
# isn't, it removes the file, and then proceeds as normal.
# if the process _is_ still running, it checks to see if it
# was started over an hour ago. If it's less than $max_runtime
# seconds old, it lets it finish, and quietly exits on its own.
# If the previous process is more than an hour old, however, it
# should go about killing it nicely, since the data will be
# too old to be of use. I need to work on this part, it isn't
# finished yet. --MNP 8-2-97
#
# O.k., I think I've got the "killing the OLD process part" pretty
# close to done now. --MNP 01-05-98
#
# OK. And now it watches system load as well.
#
# Tune these to match your OS
#-----------------------------------------------------------------------
# GLOBALS
require 5.004;
use strict;
BEGIN {
$ENV{"PATH"}="/home/mrtg/scripts:/home/mrtg/bin:/bin:/usr/bin:/usr/sbin:/sbin";
}
# flush NOW!
$| = 1 ;
my($mrtg) = "/home/mrtg/bin/mrtg";
# DEBUG levels:
# 0 = hard errors only
# 1 = warnings, recoverable errors
# 2 = process flow diagnostics
# 3 = subroutine level debugging
# 4 = everything (I/O included)
my($DEBUG)=0;
my($config_file)=$ARGV[0];
my($max_load) = "25.0";
# extended runtime limit for slow datacenters
#my($max_runtime) = 360;
my($max_runtime) = 900;
# Now set for each type of run, at bottom. --MNP 01-09-98
# $lockfname = "mrtg_gif_gen.lock";
my($ps) = "/bin/ps";
# my($psflags) = "-ef";
my($psflags) = "-ufwwwp";
# my($PIDLINE) = "PPID";
my($PIDLINE) = "PID";
my($grep) = "/usr/bin/grep";
my($grepflags) = "";
my($head) = "/usr/bin/head";
my($headlines) = "-1";
my($basename) = "/usr/bin/basename";
#-----------------------------------------------------------------------
#-----------------------------------------------------------------------
sub errorexit {
print ("$0: Error: @_.\n");
exit(21);
}
#-----------------------------------------------------------------------
#-----------------------------------------------------------------------
sub get_load {
my($status)=0;
my($wub)="";
$wub=`/usr/bin/uptime`;
my($first,$second) = (split('load averages: ',$wub));
&debug("$first ::: $second ",4);
my($one,$two,$three) = (split(", ",$second));
chomp($one);
chomp($two);
chomp($three);
&debug("Values were: $one, $two, $three",3);
return($one);
}
#-----------------------------------------------------------------------
#-----------------------------------------------------------------------
sub GetDirectory {
my($conffile)=@_ if @_;
my($junk, $directory) = ("","");
open(CONFIGFILE,"$conffile") || die "Couldn't open file $conffile: $!";
&debug("About to work on $conffile",2);
while (<CONFIGFILE>) {
&debug("Working on line: $_",4);
if ( /^WorkDir:/ ) {
($junk,$directory) = split(" ",$_);
&debug("WorkDir is: $directory",2);
}
}
close(CONFIGFILE);
if ( $directory eq "" ) {
die "Couldn't find a WorkDir for config file $conffile: $! ";
}
return($directory);
}
#-----------------------------------------------------------------------
#-----------------------------------------------------------------------
sub PIDexists {
my($pid) = @_ if @_;
my($oldpid,$result);
# my($cmd_str) = "$ps $psflags | $grep -v $grep | $grep $grepflags $oldpid";
my($cmd_str) = "$ps $psflags $oldpid";
&debug("About to do: $cmd_str",3);
# $result=`$cmd_str`;
$result=system("$cmd_str");
# returns 0 upon finding string, 256 upon no match.
if ($result==0) {
# PID exists
return 1;
} else {
# PID doesn't exist
return 0;
}
}
#-----------------------------------------------------------------------
#-----------------------------------------------------------------------
# Check for lockfile in a given directory; if none, return false,
# otherwise check to see if the process is still running; if not,
# remove it and return false, otherwise return true
sub IsDirectoryLocked {
my($workdir,$lockfile) = @_ if @_;
my($cmd_str,$oldpid,$result);
&debug("Checking for lock in: $workdir",2);
# $lockfile=$workdir."/mrtg.lock";
&debug("opening lock file $lockfile",2);
# if I can't stat it, chances are it doesn't exist
my($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size,
$atime,$mtime,$ctime,$blksize,$blocks) = stat $lockfile or return 0;
open(OLDPID,"$lockfile") || die "Couldn't open old PID file $lockfile: $!";
$oldpid=<OLDPID>;
close OLDPID;
&debug("$lockfile exists, but oldpid is $oldpid",0) if (! $oldpid >= 1);
# $cmd_str= "$ps $psflags | $grep -v $grep | $grep $grepflags $oldpid";
$cmd_str= "$ps $psflags $oldpid | $grep -v $PIDLINE";
&debug("About to do: $cmd_str",3);
$result=`$cmd_str`;
if ((! $result) and (! -e $lockfile )) {
# lockfile vanished since I tried to do the PS, and there's no
# running process--yay, we're golden!
return 0;
}
if (! $result) {
# since we didn't match the above clause, must mean there's
# no process, but we still have a lockfile--deal with it.
&debug("Caught ORPHANED lock file: unlinking $lockfile",0);
&debug("command was: $cmd_str",0);
&debug("LS before unlink attempt:",0);
system "/bin/ls","-al","$lockfile";
my($ul_res)=unlink $lockfile;
&debug("LS AFTER unlink attempt:",0);
##system "/bin/ls","-al","$lockfile";
&debug("UNLINK of $lockfile failed--result $ul_res",0) if ($ul_res != 1);
return 0 if ($ul_res eq 1);
return 3;
}
&debug("Result was: $result",2);
# We had a lockfile, and that process is still running; how old is it?
my($current)=time;
if ($ctime < 1000) {
&debug("STAT error on $lockfile--CTIME is $ctime",0);
&debug("using MTIME instead: $lockfile--MTIME is $mtime",0);
$ctime=$mtime;
if ($mtime < 1000) {
&debug("CRAP--mtime is bogus too--trying ls -al $lockfile",0);
system "/bin/ls","-al","$lockfile";
}
}
my($deltatime)= $current - $ctime;
if ( $deltatime < $max_runtime ) {
# Young file, quietly exit.
return 1;
}
# Otherwise, it's an old file, and we should do something noisy to
# let the user know a process may be hung. Later, we'll add code to
# actually kill the process.
if ($deltatime > $max_runtime) {
&debug("In mrtg.wrapper, directory $workdir has been LOCKED for more than $max_runtime seconds ($deltatime seconds)(Time is $current, Ctime is $ctime), but is NOT orphaned!",0);
}
# Let's try killing it:
# my($count) = 0;
# my($pidexists) = 1;
# while ($pidexists == 1) {
# print STDERR "KILLING $oldpid, COUNT $count\n";
# if ($count < 3 ) {
# kill 1, $oldpid;
# } else {
# kill 9, $oldpid;
# }
# sleep 20;
# $count++;
# $pidexists = &PIDexists($oldpid);
# $pidexists = 0 if ($count > 5);
# }
# unlink $lockfile;
return 2;
}
#-----------------------------------------------------------------------
#-----------------------------------------------------------------------
sub debug {
my ($message, $level) = @_;
$level = 0 unless $level;
if ($level <= $DEBUG) {
my ($i);
print STDERR "D: ";
for ($i = 0; $i < $level; $i++) {
print STDERR " ";
}
print STDERR "$message\n";
}
}
#-----------------------------------------------------------------------
# MAIN
#-----------------------------------------------------------------------
&debug("My config file should be: $config_file",1);
my($mydir) = &GetDirectory($config_file);
my($cmd_str) = "$basename $config_file .cfg";
&debug("About to do $cmd_str",2);
my($result) = `$cmd_str`;
chomp($result);
&debug("after chomp, result was: $result",2);
my($device)="$result";
my($mypidfile)="$mydir/mrtg.$device.lock";
&debug("global PIDFILE is $mypidfile",3);
if ( $mydir eq "" ) { die "Couldn't get working directory $!"; }
my($res)=&IsDirectoryLocked($mydir,$mypidfile);
if ($res) {
&debug("Directory $mydir is LOCKED! $mypidfile--skipping this run",1);
} else {
my($sleep_count) = 0;
my($load_high) = 0;
my($cur_load) = &get_load;
LOAD: while($cur_load >= $max_load) {
&debug("Load of $cur_load is over $max_load, sleep count is $sleep_count, and we need to do $mydir: $device",1);
$load_high=1;
if ($sleep_count > 10) {
&debug("Not sure if the die is being caught or not...",2);
die "Can't proceed, load is $cur_load, max_load is $max_load -- still too high after 110 seconds of waiting";
}
sleep 10;
$sleep_count++;
$cur_load = &get_load;
}
&debug("We can proceed; load average passed",1) if ($load_high);
open(PIDFILE,">$mypidfile") || die "Couldn't open $mypidfile: $!";
print PIDFILE $$;
close(PIDFILE);
my ($args) = join " ", @ARGV;
system "$mrtg $args\n";
unlink $mypidfile;
}
#-----------------------------------------------------------------------
mpetach at tftp1:/home/mrtg>
More information about the mrtg
mailing list