[rrd-developers] holt-winters smoothing
Evan Miller
emiller at imvu.com
Tue Aug 21 22:41:17 CEST 2007
The Holt-Winters prediction code runs a data smoother periodically so that the
first few periods of predictions look nice and clean. There are a number of
flaws in the smoother, and I'd like to discuss what to do about them.
1. The time the smoother runs during the period is calculated by hashing the
filename. Look:
rrd_create.c line 586: current_rra->par[RRA_seasonal_smooth_idx].u_cnt =
hashed_name % period;
If you ask me, this is completely nonsensical. It means that if you create two
identical archives and feed them identical data, they rarely produce identical
predictions. This makes regression testing frustrating until you learn the big
"secret" of naming your archives the same thing.
2. The smoother runs at different times based on whether RRDtool has received
individual updates or a bulk update with identical data. From a comment in
rrd_update.c:
/* calling the smoothing code here guarantees at most one smoothing
* operation per rrd_update call. Unfortunately, it is possible with bulk
* updates, or a long-delayed update for smoothing to occur off-schedule.
* This really isn't critical except during the burn-in cycles. */
This behavior is counter-intuitive and complicates testing.
3. The smoother takes a running average over 5% of a season. This 5% figure is
hard-coded, and has produced quite awful predictions when I've set the season
to be a week, since each point becomes an average over about 8 hours.
I'm not sure about what to do about #2, but I think #1 should be made more
consistent (say, run at a fixed interval), and #3 should perhaps be an input
parameter, although I don't know the right interface for it.
At IMVU, we've disabled the smoother altogether because of its problems. I
wouldn't be averse to ripping out the smoother altogether, but I imagine others
find the feature useful.
Thoughts?
Evan
More information about the rrd-developers
mailing list