[rrd-users] HWPredict to filter outlier data on the fly?
david.c.purdy at gmail.com
Tue Jan 28 21:36:32 CET 2014
Is there a way to use HWPredict to filter errors (ie. outlier data) from the data stream, on the fly, at the moment it is being collected?
I have a 1-wire temperature sensor that is polled once a minute. Occasionally a piece of data is obviously a garbage/noise/outlier point, and ideally it should be discarded. I note that this is a GAUGE situation, but also, (normal) meteorological temperature data will be differentiable with respect to time: in the (rough) Calculus sense, there can't be any nasty cusps or vertices in the data.
An example showing such a cusp is this ( the 19:24 data is borked) :
<!-- 2014-01-26 19:19:00 CST / 1390785540 --> <row><v>1.2987500000e+01</v>
<!-- 2014-01-26 19:20:00 CST / 1390785600 --> <row><v>1.2987500000e+01</v>
<!-- 2014-01-26 19:21:00 CST / 1390785660 --> <row><v>1.2987500000e+01</v>
<!-- 2014-01-26 19:22:00 CST / 1390785720 --> <row><v>1.2880211103e+01</v>
<!-- 2014-01-26 19:23:00 CST / 1390785780 --> <row><v>1.2542394264e+01</v>
<!-- 2014-01-26 19:24:00 CST / 1390785840 --> <row><v>6.7108108375e-01</v>
<!-- 2014-01-26 19:25:00 CST / 1390785900 --> <row><v>1.0005696817e+01</v>
<!-- 2014-01-26 19:26:00 CST / 1390785960 --> <row><v>1.2200000000e+01</v>
<!-- 2014-01-26 19:27:00 CST / 1390786020 --> <row><v>1.1985339718e+01</v>
<!-- 2014-01-26 19:28:00 CST / 1390786080 --> <row><v>1.1975000000e+01</v>
Graphically, this might look like the spike at about 16:25 hrs : https://www.dropbox.com/s/zrvi15ez0zbqj2j/1wiretemps_showing_outlierspike.png
My current and simplistic solution to this is to discard any data for which the rate of change exceeds a real-world limit.: 3.5 deg (F) per minute. For instance the fastest recorded temperature drop is 27.2 °C (49 °F) in 15 minutes; Rapid City, South Dakota, 1911-01-10. So, currently I'm just testing the (absolute value of) temperature change over the last minute to see if it is less than 3.5 degrees. If so, then I'll assume it is good data, otherwise, I'll discard it.
In effect, this requires the absolute value of the 1st derivative to be less than 3.5 (deg F/ min).
Perhaps there is a more intelligent, sophisticated and built-in method for using the Holt-Winters methods (HWPredict), or perhaps the 2nd derivative as well?
If so, could you provide some details on this, perhaps showing syntax and rpn-format?
More information about the rrd-users