[rrd-users] trigger an alert?
sveniu at opera.com
Thu May 3 23:19:32 CEST 2007
John Conner wrote:
> Thanks a lot, Sven!
> Still fairly new to rrdtool and never used the "updatev" option, gonna
> check it right now.
> Do you have any documents handy on how you implement this? if you do,
> could you point me the link?
Sure, here are some quick notes on how to set up aberrant behaviour
detection for a data value. My example is based on actual monitoring
of a network link with somewhat strong periodical behaviour; that is,
you can easily identify a repeating (daily) pattern in the traffic
This is the rrdtool create command I use. I've added comments to
some of the lines:
rrdtool create network-uplink.rrd \
--start 1166600000 \
--step 120 \ # sample every 2 minutes
DS:pktsin:DERIVE:180:0:4294967295 \ #maintain counters for packets
DS:bytesin:DERIVE:180:0:4294967295 \ #and bytes, inbound and outbound
RRA:AVERAGE:0.5:1:840 \ #day graph
RRA:AVERAGE:0.5:15:384 \ #week graph
RRA:AVERAGE:0.5:60:384 \ #month graph
RRA:AVERAGE:0.5:720:400 \ #year graph
RRA:HWPREDICT:1440:0.05:0.0035:720:6 \ #Detailed notes below
About the last RRAs:
- HWPREDICT is set up to use a seasonal period of 720 datapoints.
720 datapoints with intervals of 2 minutes equals exactly 24 hours.
I.e., the traffic pattern repeats every day. You might want to use
an entire week as the seasonal period, depending on your patterns.
- The alpha,beta,gamma values are not all that easy to tune properly
to your data source, in my opinion. I've chosen fairly generic
values, based on those found in
- HWPREDICT has index 6, SEASONAL has 5, and so on. This is the rra-num
index number, and was not entirely easy to figure out based on the
documentation, which states "The rra-num argument is the 1-based
index in the order of RRA creation (that is, the order they appear
in the create command)." It simply refers to the index number of
the RRAs, counting from 1 (this includes *all* RRAs, AVERAGE too!)
HWPREDICT should refer to the SEASONAL index, SEASONAL to HWPREDICT,
DEVSEASONAL to HWPREDICT, DEVPREDICT to DEVSEASONAL and FAILURES
The next step that I would have easily worked out if I read the
documentation properly, is to adjust the positive and negative
confidence band factors. The default is 2, which I find a bit too
unforgiving for my scenario. To adjust it to 5, run:
rrdtool tune network-uplink.rrd --deltapos 5 --deltaneg 5
Here's how I graph the daily graph for the inbound byte counter:
rrdtool graph \
--font LEGEND:7 \
--font UNIT:7 \
--font AXIS:7 \
--base 1024 \
-l 0 -r \
-w 400 \
-h 125 \
--start end-100800 \
--title "Network traffic, by day" \
--vertical-label "Bytes/sec" \
--x-grid "HOUR:1:DAY:1:HOUR:4:0:%H:%M" \
CDEF:dev_lower=a_pred,a_dev,5,*,- \ # Note we're using 5 as the scaling factor
CDEF:dev_upper=a_pred,a_dev,5,*,+ \ # when graphing! Same as in the tune command.
COMMENT:"Current\:" GPRINT:a_last:%6.2lf \
COMMENT:"Average\:" GPRINT:a_average:%6.2lf \
COMMENT:"Last update\: `date \"+%Y-%m-%d %H\\:%M\\:%S %Z\"`"\\r
Next, to actually have it report aberrant behaviour in real-time,
as opposed to post-mortem, you'll need a wrapper script to run
'rrdtool updatev' and parse the output. There are probably fancy
bindings in perl for this, or some other graceful way of doing it.
My way is a quick python script that parses the output looking for
'FAILURES', and then determining if the corresponding value is
greater than 0.0.
Well, that's pretty much it. Good luck!
>> I use the aberrant behaviour detection in rrdtool and I find
>> it quite handy. To detect problems, i use the 'rrdtool updatev'
>> command, which will output FAILURE=1.0 (different syntax), if
>> it detects failures. FAILURE=0.0 if not. In other words, I parse
>> the output of the command, and trigger alerts based on it. You
>> should probably implement a wrapper around the parsing/alarming,
>> so that you won't get flooded with mails/SMS messages every five
>> minutes while a deviation is happening.
More information about the rrd-users