[rrd-developers] RRD support for aberrant behavior detection
jakeb at microsoft.com
Fri Jun 23 00:00:03 MEST 2000
RRD Developers and Tobi Oetiker,
As you may be aware, at WebTV we make extensive use of RRDtool and Cricket.
We use it not only for real-time monitoring of network hardware such as
router interfaces and switch ports, but real-time monitoring of software
applications and processes running on a large number of Solaris hosts.
We have long debated the best approach to incorporating real-time aberrant
behavior detection into the monitoring system. Cricket implements
thresholding; that is it generates alerts (via email or SNMP traps) when the
time series which it is enable exceeds absolute bounds set in the Cricket
configuration files. This mechanism is simple and effective for detecting
some kinds of aberrant behavior.
However, there is a need for a more sophisticated algorithm for aberrant
behavior detection. Unfortunately, there is no uniformly best choice, but
there are desirable characteristics we can select for:
(1) Provides near real-time detection for the monitoring application.
(2) Adapts over time in real-time as the time series evolves.
(3) Low computation and disk overhead.
(4) Easy to understand and tune.
Given these goals, and our reliance on RRDtool and Cricket, we are
proceeding to implement such an algorithm in RRDtool. While such
functionality could be encoded in another stand alone application, the
primary motivation for adding this functionality to RRDtool is efficiency.
At WebTV, we are acutely aware of the fact that a small inefficiency,
perhaps inperceivable at the single process level, can result in a
significance performance impediment as the number of processes scales up. A
further advantage of RRDtool is leveraging the package of graphing
capabilties already included in RRDtool.
We would prefer these enhancements to RRDtool, once completed and in service
here at WebTV, be incorporated into the public distribution of RRDtool. That
of course, is a decision to be made by the RRD community. There are number
of questions to asked:
(1) What are the plans for next big version of RRD? I know that smoothing
algorihtms have already been proposed. The aberrant behavior detection
algorithm does provide a smoothing algorithm as a subset.
(2) Should aberrant behavior detection be available in RRD? I think most
network administrators agree the functionality is desirable, the question
is: should it be a part of RRD? This is the longstanding trade-off between
modular code and efficient code.
(3) If so, what algorithms should be used? I will freely admit what we are
implementing at WebTV is not an optimal algorithm. However, many algorithms
are inappropriate for real time monitoring, or are far too complicated for a
network technician to tune without a PhD consultant looking over his
A draft description of our implementation (already underway) is at
http://cricket.sourceforge.net/aberrant/rrd_hw.htm. The document is
primarily a discussion of implementation, not of the aberrant behavior
detection algorithm. This implementation touches many of the core C files of
RRD. At the same time, RRD file structure on disk is unchanged. The enhanced
tool will run with existing RRD files. This backwards compatibility is
essential, because we know our aberrant behavior detection algorithm is only
appropriate for a subset of time series. In some cases simple thresholding
(as Cricket provides) is sufficient. In others, the processing cost of
aberrant behavior detection is too high relative to the potential benefit.
I invite your comments on this project. As with any of our modifications to
RRD, we want to share them with the RRD community as a patch. We are not
ready to do so yet, but plan to do so by the end of July. I thought it best
to start a discussion on this topic sooner rather than later.
More information about the rrd-developers