[rrd-users] False positives with aberrant behavior detection

Mon Aug 16 18:17:02 CEST 2010

On Mon, 16 Aug 2010, Dave Plonka wrote:

> This would be easier for you to understand (why it's doing what it
> does) if you plot the confidence band - i.e., the line above and below
> the hwpreduct value that the observations must exceed to be considered
> a violation.

I feel stupid for asking this, but how do I define the confidence band
and how do I get rrdgraph to print it? The rrdcreate page mentions the
confidence band several times but besides "defining a matching set of
several RRDs" I can't find instructions in there on how to set my
confidence band to a certain width.  It also references rrdgraph where
supposedly there is an example of a printed confidence band, but
searching for "confidence" on the rrdgraph page doesn't yield any
results.

I'll go through the references you've listed (thanks!) as soon as I get
a sec, but if you have a snippet of rrdtool code that uses/prints
confidence bands, I'd really appreciate it!

Thanks much, you've been a big help!

-- Mike

Mike Schilli
m at perlmeister.com

>
>> The data is from a temperature sensor, which has a resolution of .5
>> degrees Celsius. The data covers 7 days [1] and the rrdtool commands
>> I've used are available at [2]. For this example, I've used alpha=0.5,
>> beta=0.5, gamma=0.5, with a seasonal period of 60*24 (one day in
>> one-minute steps).
>>
>> What I've noticed so far:
>>
>> * The green line (rrdtool's prediction) is only available after the 3rd
>>    day. What's the reason for that?
>
> Prediction, i.e, the "hwpredict" value, is based on past observations;
> the algorithm needs prior data points to predict, therefore there is
> some time to bootstrap it for operations.  Once the HWPREDICT RRA is
> populated though, you won't have to wait again (as long as you don't
> have gaps in your data points/observations.)
>
>> * There's a clear jump in the middle of the graph which goes undetected.
>
> This can happen (by design) if you have the H-W RRD attributes set to
> only consider it errant if `n' samples fall outside the expected range
> within the configured window of points - since this is a very short
> duration anomaly (perhaps only one data point), it is not reported
> as an error.  That's configurable - see the "threshold" value you
> set in the FAILURES RRA.  The default is that 7 observations of 9
> must be out of the confidence band before it is reported as a failure
> (vs. the predicition).
>
>> * There's a high number of false positives, starting after the spike,
>> and continuing until the end of the graph. I've tried various
>> combinations of alpha, beta, and gamma to get rid of them but without
>> success.
>
> This would be easier to understand if you plot the confidence band.
> It looks to me like your band is way too tight.
>
> If you haven't already, I suggest reading Jake Brutlag's orginal
> paper, available online from the LISA 2000 Conference:
>
>   "Aberrant Behavior Detection in Time Series for Network Service Monitoring"
>   http://www.usenix.org/events/lisa00/brutlag.html
>
> I've also done some work in which we used this H-W implentation
> for evaluation of our method; might be helpful:
>
>   "A Signal Analysis of Network Traffic Anomalies"
>   http://pages.cs.wisc.edu/~pb/paper_imw_02.pdf (sample parameters page 11 - 300 second step, IIRC)
>
>   "Traffic Anomaly Detection at Fine Timescales with Bayes Nets"
>   http://pages.cs.wisc.edu/~pb/icimp08_final.pdf (sample parameters page 8 - 1 second step)
>
> Note that the HW parameters can be very sensitive to your "step" value.
> So, don't expect defaults to work if they were meant for a 300 second
> step, and you're using a 60 second step... as usual, it's best to
> understand them completely to choose reasonable values.
>
> Dave
>
> --
> plonka at cs.wisc.edu  http://net.doit.wisc.edu/~plonka/  Madison, WI
>