[rrd-users] counting errors in rrd

mathew anderson snotling at gmail.com
Fri Mar 4 17:16:49 CET 2011

Thanks for the reply.

As for normalizing the data, I had the suspicion that is what was going on,
so thank you for clarifying it. I started to look at the link you posted and
will go over it more in detail.  It will help out.  I can easily see how a
CDP's result will be different then an actual outage doing it the way I have
it setup.

I'll take a look at count.  Seems to be a nicer way of doing this, but will
take me some time to get it set up.
The ultimate goal in what I am doing is to show a graph of the response
time, with our SLA times marked.  With outages in and out of SLA times
(which is what I have setup now).  The next step will be to figure out the
total SLA availability and total avilability of the service that I am
monitoring.  I was hoping to use the flags for this, realizing that it is
not a true measure of uptime.  Checking every 5 mins, can show a 10 min
outage, even though it was only down for the moment it took to do the check.

Again, thanks for your time.


On Fri, Mar 4, 2011 at 2:58 AM, Simon Hobson <linux at thehobsons.co.uk> wrote:

> mathew anderson wrote:
> >I have a single RRD file that has the value 100 in places.  Whenever
> >my monitoring sees an error (probes every 5 mins), it pushes a 100
> >into an rrd file.  I am trying to figure out how many times this
> >value is in a given time range.
> Are you aware that except under certain very strict conditions, what
> is stored is NOT what you entered ?
> ALL input data is normalised, and then consolidated. If your data
> entry times don't exactly match step boundaries then it normalisation
> will alter it. Suppose your step time was 1 minute (60 seconds),
> you'd been entering zeros, then at 20s past the minute you enter 100,
> and 20s past the next minute you enter zero again, and continue
> entering zeros. The nomalisation means that for your one minute with
> a value of 100, 2/3 (ie 66.6) will go into one step period, and the
> rest (33.3) will go into the next. So you'd get out 0, 0, 66.6, 33.3,
> 0, 0
> Then say you had a consolidation for 10 minute periods. The
> consolidated average for that would then be 10 (assuming both the
> non-zero normalised values fall into the same consolidated time
> period).
> See : http://www.vandenbogaerdt.nl/rrdtool/
> In particular the one on Rates, normalizing and consolidating
> Also, note that all time periods are referenced to unix epoch
> (midnight, 1st Jan 1970). So with a step time of 300, step periods
> start on the hour, 5 minutes past the hour, etc. If you consolidate 6
> PDPs to a CDP (ie 1/2 hour) then these consolidated periods will be
> on the hour and half hour.
> Given that you seem to be logging errors, it may be better to log the
> error count rather than a flag. If the errors are reported as a
> count, then use a counter data type and rrd will take care of
> converting that to a rate. You can then get rrd graph to do logic
> such as "if rate > some_threshold then draw it in red".
> --
> Simon Hobson
> Visit http://www.magpiesnestpublishing.co.uk/ for books by acclaimed
> author Gladys Hobson. Novels - poetry - short stories - ideal as
> Christmas stocking fillers. Some available as e-books.
> _______________________________________________
> rrd-users mailing list
> rrd-users at lists.oetiker.ch
> https://lists.oetiker.ch/cgi-bin/listinfo/rrd-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.oetiker.ch/pipermail/rrd-users/attachments/20110304/aa75d7d2/attachment-0001.htm 

More information about the rrd-users mailing list