[mrtg] MRTG multiple processes

Matthew Petach mpetach at yahoo.com
Fri Mar 12 10:15:54 CET 2010


----- Original Message ----

> From: Pavel Ruzicka <pavouk at pavouk.org>
> To: mrtg at lists.oetiker.ch
> Sent: Thu, March 11, 2010 6:31:29 AM
> Subject: [mrtg] MRTG multiple processes
> 
> Hello,

I have troubles, when MRTG have unaccesible more boxes and doesn't 
> end
in 5 minutes interval.

This is output prom ps:
root  
>     3012  0.0  0.0  74840  1172 ?    
>     Ss   Jan22   0:05 crond        
>                       
>                       
>                       
>   
mrtg      7664  0.0  0.0 102020  1536 
> ?        S    15:05   0:00  \_ 
> crond                    
>                       
>                       
>         
mrtg      7668  1.0  
> 3.0 325848 251420 ?       Ss   15:05   0:11  
> |   \_ 
/usr/bin/perl -w /usr/local/mrtg/bin/mrtg 
> /usr/local/mrtg/cfg/mrtg.cfg    
mrtg      
> 7687  0.0  2.5 292360 210988 ?       S    
> 15:05   0:00  |   |   \_ 
/usr/bin/perl -w 
> /usr/local/mrtg/bin/mrtg /usr/local/mrtg/cfg/mrtg.cfg
mrtg    
>   7689  0.0  0.0  57684  3600 ?      
>   S    15:05   0:00  |   \_ 
/usr/sbin/sendmail 
> -FCronDaemon -i -odi -oem -oi -t              
>         
mrtg      7709  0.0  
> 0.0 102020  1536 ?        S    15:10  
> 0:00  \_ crond
mrtg      7713  1.4  3.0 325848 
> 251416 ?       Ss   15:10   0:11  |   \_ 
> 
/usr/bin/perl -w /usr/local/mrtg/bin/mrtg 
> /usr/local/mrtg/cfg/mrtg.cfg
mrtg      7732  0.0  
> 2.5 292360 210988 ?       S    15:10   0:00  
> |   |   \_ 
/usr/bin/perl -w /usr/local/mrtg/bin/mrtg 
> /usr/local/mrtg/cfg/mrtg.cfg
mrtg      7734  0.0  
> 0.0  57684  3596 ?        S    
> 15:10   0:00  |   \_ 
/usr/sbin/sendmail -FCronDaemon -i -odi 
> -oem -oi -t
mrtg      7740  0.0  0.0 102020  
> 1536 ?        S    15:15   0:00  \_ 
> crond
mrtg      7744  2.3  3.0 325848 251424 ?  
>      Ss   15:15   0:11  |   \_ 
/usr/bin/perl 
> -w /usr/local/mrtg/bin/mrtg /usr/local/mrtg/cfg/mrtg.cfg
mrtg    
>   7763  0.0  2.5 292360 210992 ?       S  
>   15:15   0:00  |   |   \_ 
/usr/bin/perl -w 
> /usr/local/mrtg/bin/mrtg /usr/local/mrtg/cfg/mrtg.cfg
mrtg    
>   7765  0.0  0.0  57688  3596 ?      
>   S    15:15   0:00  |   \_ 
/usr/sbin/sendmail 
> -FCronDaemon -i -odi -oem -oi -t
mrtg      7769  
> 0.0  0.0 102020  1536 ?        S    
> 15:20   0:00  \_ crond
mrtg      7773  
> 6.5  3.0 325320 250856 ?       Ss   15:20  
> 0:11      \_ 
/usr/bin/perl -w /usr/local/mrtg/bin/mrtg 
> /usr/local/mrtg/cfg/mrtg.cfg
mrtg      7792  0.2  
> 2.5 292360 210988 ?       S    15:20   0:00  
>     |   \_ 
/usr/bin/perl -w /usr/local/mrtg/bin/mrtg 
> /usr/local/mrtg/cfg/mrtg.cfg
mrtg      7794  0.0  
> 0.0  57688  3600 ?        S    
> 15:20   0:00      \_ 
/usr/sbin/sendmail -FCronDaemon -i 
> -odi -oem -oi -t

I run MRTG from cron:
*/5 * * * * 
> /usr/local/mrtg/bin/mrtg /usr/local/mrtg/cfg/mrtg.cfg

I know, that MRTG 
> check if there doesn't run concurrent process.
Do you have idea, why this 
> doesn't work?
Or how can I solve it?

I have MRTG 2.16.2 and CENTOS5 
> distribution of linux.

Best regards,

Pavel 
> Ruzicka



Hi Pavel,

I found the lockfile handling in MRTG to be woefully inadequate, so
I wrote a wrapper for MRTG that does more robust device locking
and checking to make sure the system load on the box doesn't
go too high; then I just call the wrapper script in my crontab entries,
rather than the mrtg script itself, like this--I use a separate config
file for each device, and then have a separate line in the crontab file
for each device:

#
0-59/5 * * * * /home/mrtg/scripts/mrtg.wrapper /home/mrtg/cfg/pat1.dee.cfg
0-59/5 * * * * /home/mrtg/scripts/mrtg.wrapper /home/mrtg/cfg/pat2.dee.cfg
#


And here's my wrapper script--feel free to hack it up into something
that works for you.  ^_^;

Matt


mpetach at tftp1:/home/mrtg> cat scripts/mrtg.wrapper 
#!/usr/bin/perl
#
# Simple wrapper to prevent multiple MRTG instances from
# trying to access the same data directory simultaneously,
# which leads to data corruption.
#
# It simply writes the PID of itself to an mrtg.lock file
# in the data directory specified in the conf file given
# as the argument, runs mrtg with its normal command line,
# then removes the mrtg.lock file upon finishing.
#
# If it already finds an mrtg.lock file in the directory, it
# does a ps to see if the process is still running.  If it
# isn't, it removes the file, and then proceeds as normal.
# if the process _is_ still running, it checks to see if it
# was started over an hour ago.  If it's less than $max_runtime
# seconds old, it lets it finish, and quietly exits on its own.  
# If the previous process is more than an hour old, however, it
# should go about killing it nicely, since the data will be
# too old to be of use.  I need to work on this part, it isn't
# finished yet.  --MNP  8-2-97
#
# O.k., I think I've got the "killing the OLD process part" pretty
# close to done now. --MNP  01-05-98
#
# OK.  And now it watches system load as well.
#
# Tune these to match your OS

#-----------------------------------------------------------------------
# GLOBALS
require  5.004;
use      strict;
 
BEGIN {
   $ENV{"PATH"}="/home/mrtg/scripts:/home/mrtg/bin:/bin:/usr/bin:/usr/sbin:/sbin";
}

# flush NOW!
$|                 = 1 ;

my($mrtg) = "/home/mrtg/bin/mrtg";

# DEBUG levels:
#   0 = hard errors only
#   1 = warnings, recoverable errors
#   2 = process flow diagnostics
#   3 = subroutine level debugging
#   4 = everything (I/O included)
my($DEBUG)=0;

my($config_file)=$ARGV[0];

my($max_load)      = "25.0";
# extended runtime limit for slow datacenters
#my($max_runtime)   = 360;
my($max_runtime)   = 900;
# Now set for each type of run, at bottom.  --MNP  01-09-98
# $lockfname   = "mrtg_gif_gen.lock";
my($ps)            = "/bin/ps";
# my($psflags)       = "-ef";
my($psflags)       = "-ufwwwp";
# my($PIDLINE)       = "PPID";
my($PIDLINE)       = "PID";
my($grep)          = "/usr/bin/grep";
my($grepflags)     = "";
my($head)          = "/usr/bin/head";
my($headlines)     = "-1";
my($basename)      = "/usr/bin/basename";
#-----------------------------------------------------------------------

#-----------------------------------------------------------------------
sub errorexit {
   print ("$0:  Error:  @_.\n");
   exit(21);
}
#-----------------------------------------------------------------------

#-----------------------------------------------------------------------
sub get_load {
  my($status)=0;
  my($wub)="";
  $wub=`/usr/bin/uptime`;
  my($first,$second) = (split('load averages: ',$wub));
  &debug("$first ::: $second ",4);
  my($one,$two,$three) = (split(", ",$second));
  chomp($one);
  chomp($two);
  chomp($three);
  &debug("Values were: $one, $two, $three",3);
  return($one);
}
#-----------------------------------------------------------------------

 

#-----------------------------------------------------------------------
sub GetDirectory {
   my($conffile)=@_ if @_;
   my($junk, $directory) = ("","");
   open(CONFIGFILE,"$conffile") || die "Couldn't open file $conffile: $!";
   &debug("About to work on $conffile",2);
   while (<CONFIGFILE>) {
       &debug("Working on line: $_",4);
       if ( /^WorkDir:/ ) {
          ($junk,$directory) = split(" ",$_);
          &debug("WorkDir is: $directory",2);
       }
   }
   close(CONFIGFILE);
   if ( $directory eq "" ) {
          die "Couldn't find a WorkDir for config file $conffile: $! ";
   }
   return($directory);
}
#-----------------------------------------------------------------------


#-----------------------------------------------------------------------
sub PIDexists {
   my($pid) = @_ if @_;
   my($oldpid,$result);
#   my($cmd_str) = "$ps $psflags | $grep -v $grep | $grep $grepflags $oldpid";
   my($cmd_str) = "$ps $psflags $oldpid";
   &debug("About to do: $cmd_str",3);
#   $result=`$cmd_str`;
   $result=system("$cmd_str");
# returns 0 upon finding string, 256 upon no match.
   if ($result==0) {
# PID exists
     return 1;
   } else {
# PID doesn't exist 
     return 0;
   }
}
#-----------------------------------------------------------------------


#-----------------------------------------------------------------------
# Check for lockfile in a given directory; if none, return false,
# otherwise check to see if the process is still running; if not,
# remove it and return false, otherwise return true
sub IsDirectoryLocked {
   my($workdir,$lockfile) = @_ if @_;
   my($cmd_str,$oldpid,$result);
   &debug("Checking for lock in: $workdir",2);
   # $lockfile=$workdir."/mrtg.lock";
   &debug("opening lock file $lockfile",2);
   # if I can't stat it, chances are it doesn't exist
   my($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size,
      $atime,$mtime,$ctime,$blksize,$blocks) = stat $lockfile or return 0;
   open(OLDPID,"$lockfile") || die "Couldn't open old PID file $lockfile: $!";
   $oldpid=<OLDPID>;
   close OLDPID;
   &debug("$lockfile exists, but oldpid is $oldpid",0) if (! $oldpid >= 1);
#   $cmd_str= "$ps $psflags | $grep -v $grep | $grep $grepflags $oldpid";
   $cmd_str= "$ps $psflags $oldpid | $grep -v $PIDLINE";
   &debug("About to do: $cmd_str",3);
   $result=`$cmd_str`;
   if ((! $result) and (! -e $lockfile )) {
      # lockfile vanished since I tried to do the PS, and there's no
      # running process--yay, we're golden!
      return 0;
   }
   if (! $result) {
      # since we didn't match the above clause, must mean there's
      # no process, but we still have a lockfile--deal with it.
      &debug("Caught ORPHANED lock file: unlinking $lockfile",0);
      &debug("command was: $cmd_str",0);
      &debug("LS before unlink attempt:",0);
      system "/bin/ls","-al","$lockfile";
      my($ul_res)=unlink $lockfile;
      &debug("LS AFTER unlink attempt:",0);
      ##system "/bin/ls","-al","$lockfile";
      &debug("UNLINK of $lockfile failed--result $ul_res",0) if ($ul_res != 1);
      return 0 if ($ul_res eq 1);
      return 3;
   } 
   &debug("Result was: $result",2);
   # We had a lockfile, and that process is still running; how old is it?
   my($current)=time;
   if ($ctime < 1000) {
      &debug("STAT error on $lockfile--CTIME is $ctime",0);
      &debug("using MTIME instead: $lockfile--MTIME is $mtime",0);
      $ctime=$mtime;
      if ($mtime < 1000) { 
         &debug("CRAP--mtime is bogus too--trying ls -al $lockfile",0);
         system "/bin/ls","-al","$lockfile";
      }
   } 
   my($deltatime)= $current - $ctime;
   if ( $deltatime < $max_runtime ) {
# Young file, quietly exit.
      return 1;
   }
# Otherwise, it's an old file, and we should do something noisy to
# let the user know a process may be hung.  Later, we'll add code to
# actually kill the process.
   if ($deltatime > $max_runtime) {
      &debug("In mrtg.wrapper, directory $workdir has been LOCKED for more than $max_runtime seconds ($deltatime seconds)(Time is $current, Ctime is $ctime), but is NOT orphaned!",0); 
   } 
# Let's try killing it:
#      my($count) = 0;
#      my($pidexists) = 1;
#      while ($pidexists == 1) {
#         print STDERR "KILLING $oldpid, COUNT $count\n";
#         if ($count < 3 ) {
#            kill 1, $oldpid;
#         } else {
#            kill 9, $oldpid;
#         }
#         sleep 20;
#         $count++;
#         $pidexists = &PIDexists($oldpid);
#         $pidexists = 0 if ($count > 5);
#      }
#      unlink $lockfile;
   return 2;
}
#-----------------------------------------------------------------------


#-----------------------------------------------------------------------
sub debug {
   my ($message, $level) = @_;
   $level = 0 unless $level;
   if ($level <= $DEBUG) {
      my ($i);
      print STDERR "D: ";
      for ($i = 0; $i < $level; $i++) {
         print STDERR "  ";
      }
      print STDERR "$message\n";
   }
}
#-----------------------------------------------------------------------


# MAIN
#-----------------------------------------------------------------------
&debug("My config file should be: $config_file",1);
my($mydir) = &GetDirectory($config_file);
my($cmd_str) = "$basename $config_file .cfg";
&debug("About to do $cmd_str",2);
my($result) = `$cmd_str`;
chomp($result);
&debug("after chomp, result was: $result",2);
my($device)="$result";
my($mypidfile)="$mydir/mrtg.$device.lock";
&debug("global PIDFILE is $mypidfile",3);
if ( $mydir eq "" ) { die "Couldn't get working directory $!"; }
my($res)=&IsDirectoryLocked($mydir,$mypidfile);
if ($res) { 
   &debug("Directory $mydir is LOCKED! $mypidfile--skipping this run",1);
} else {
   my($sleep_count) = 0;
   my($load_high) = 0;
   my($cur_load) = &get_load;
LOAD: while($cur_load >= $max_load) {
      &debug("Load of $cur_load is over $max_load, sleep count is $sleep_count, and we need to do $mydir: $device",1);
      $load_high=1;
      if ($sleep_count > 10) {
         &debug("Not sure if the die is being caught or not...",2);
         die "Can't proceed, load is $cur_load, max_load is $max_load -- still too high after 110 seconds of waiting";
      }
      sleep 10;
      $sleep_count++;
      $cur_load = &get_load;
   }

   &debug("We can proceed; load average passed",1) if ($load_high);


   open(PIDFILE,">$mypidfile") || die "Couldn't open $mypidfile: $!";
   print PIDFILE $$;
   close(PIDFILE);
   my ($args) = join " ", @ARGV;
   system "$mrtg $args\n";
   unlink $mypidfile;
}
#-----------------------------------------------------------------------

mpetach at tftp1:/home/mrtg>  


      



More information about the mrtg mailing list