[smokeping-users] Querying rrd's directly.

Gregory Sloop gregs at sloop.net
Sat Jan 5 02:40:48 CET 2019

So, I know querying the RRD isn't exactly a smokeping problem - but I think it's an appropriate place to start.

I'm attempting to write a Nagios/OMD plugin.
Yes, there is a smokeping plug-in currently, but the problem I'm trying to solve is this...

I've had cases where latency or packet loss goes up, consistently, and I'd like to get alerts.
But I don't want alerts when a single sample gets, say 3% loss, or latency jumps 30%. But if I measured that over say, 20 minutes, or an hour, or four hours - well then I could set limits that would be a lot tighter than I would for a single sample.

For example, if packet loss is greater than 2% for an hour - well we've probably got a problem. Same with latency. It might go up for someone's upload/download - but if it climbs 40% for four hours, then it's a problem we ought to look at.

With the smokeping plugin or Nagios's TCP probe - you can really only look at the result for a single sample [essentially], not an average. 

Thus, you end up setting limits that are far outside of what might actually constitute a problem, because you might have that happen for a few minutes - perhaps a few times a day - and you don't want nagios [or smokeping] to alert on all those instances. So, that means you inevitably miss events that are important.

So, I'm wanting a smokeping plug-in that you can set it to average the last X number of minutes/hours/whatever of loss/latency/jitter and generate warnings/critical events.

So, I need to query the RRD's and pull stats.

Ok, now that I've got you so far [Thanks by the way!] - here's the problem I've got.
[I'm a terrible coder, I have a short attention span, I am even worse at perl, and I hate details! So, be patient with me!]

Code snippet: [I stole this off the web somewhere, I don't recall where...]
#!/usr/bin/perl -W

use lib qw( /usr/lib/arm-linux-gnueabihf/perl5/5.20 ../lib/perl );
use RRDs;
use POSIX qw(strftime);

#start_time is the oldest data-point, and end_time is the newest.
my $cur_time = time();                # set current time
my $end_time = $cur_time - 60;     # set end time to 1m ago
my $start_time = $end_time - 600; # set start 10m in the past
my $rrd_res = 60;
my $temp_var = "";

#$f_cur_time = ctime($cur_time);
#$f_end_time = ctime($end_time);
#$f_start_time = ctime($start_time);
#$f_end_time = ctime($end_time);

print "CT: $cur_time \n";
print strftime("%m/%d/%Y %H:%M:%S",localtime($cur_time));
print "\n \n";

print "ET: $end_time \n";
print strftime("%m/%d/%Y %H:%M:%S",localtime($end_time));
print "\n \n";

print "ST: $start_time \n";
print strftime("%m/%d/%Y %H:%M:%S",localtime($start_time));
print "\n \n";

# fetch average values from the RRD database between start and end time
my ($start,$step,$ds_names,$data) =
    RRDs::fetch("/var/lib/smokeping/Some-CPE.rrd", "AVERAGE",
                "-r", "$rrd_res", "-s", "$start_time", "-e", "$end_time");

# save fetched values in a 2-dimensional array
my $rows = 0;
my $columns = 0;
my $time_variable = $start;

print "Start: $start : ";
print strftime("%m/%d/%Y %H:%M:%S",localtime($start));
print "\n \n";
print "step: $step \n";

print "start loop \n";
print " --- \n";
foreach $line (@$data) {
  $vals[$rows][$columns] = $time_variable;
  $temp_var = $time_variable;
  print strftime("%m/%d/%Y %H:%M:%S",localtime($temp_var));
  print "\n";  

  $time_variable = $time_variable + $step;
  $temp_var = $time_variable;
  print strftime("%m/%d/%Y %H:%M:%S",localtime($temp_var));
  print "\n";  
  foreach $val (@$line) {
			print " --- \n";
			 print "row: $rows - col: $columns \n";
			 print "Val: $val ";
          		$vals[$rows][++$columns] = $val;
			 print "VC: $vals[$rows][$columns] \n";
			 print " --- \n";
  $columns = 0;


I've put in a bunch of print statements so I can try to figure out what's going on. [You can ignore all that...]
There's also some errors in the for loop, because it parses more rows than exist in the fetch - but ignore that too. [At least for now. Or you can tell me why - if you like. I'm pretty sure I'll figure it out.]

But what's interesting [at least right now] is that the first two columns have issues.
Column one [or the first returned value from every row] appears to always be null.
And the second always appears to be zero.
[At least in my case, with my RRDs.]
But I'm pretty sure it's the same with any RRD from smokeping.

I may not understand [almost certainly don't] what's going on, but I'd have expected the values in the columns 3-23 to start at 1 and go through 20. [I do 20 samples in this RRD per row.]

So, can someone explain why the first value [column] is always null, and the second is always zero? [These are all full resolution samples, no aggregation has occurred.]

Thanks for anyone who takes a stab at it.
And if you're reading Tobi, I'd be glad for your input and/or thoughts.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.oetiker.ch/pipermail/smokeping-users/attachments/20190104/aab76295/attachment.html>

More information about the smokeping-users mailing list