[rrd-developers] holt-winters smoothing

Tue Aug 28 20:31:36 CEST 2007

Tobi, would this syntax work for you for specifying the smoothing window?

RRA:SEASONAL:seasonal period:gamma[:smoothing window]:rra-num

RRA:DEVSEASONAL:seasonal period:gamma[:smoothing window]:rra-num

...where smoothing window is a fraction of the period, between 0 (no 
smoothing) and 1 (a full season)?

Evan

Tobias Oetiker wrote:
> Today Evan Miller wrote:
> 
>> The Holt-Winters prediction code runs a data smoother periodically so that the
>> first few periods of predictions look nice and clean. There are a number of
>> flaws in the smoother, and I'd like to discuss what to do about them.
>>
>> 1. The time the smoother runs during the period is calculated by hashing the
>> filename. Look:
>>
>> rrd_create.c line 586: current_rra->par[RRA_seasonal_smooth_idx].u_cnt =
>> hashed_name % period;
>>
>> If you ask me, this is completely nonsensical. It means that if you create two
>> identical archives and feed them identical data, they rarely produce identical
>> predictions. This makes regression testing frustrating until you learn the big
>> "secret" of naming your archives the same thing.
> 
> Evan, just guessing (jake should know) ... the smoother produces
> extra load and therefore it should not run on all rrds at the same
> time ... can you judge what that load is ?
> 
>> 2. The smoother runs at different times based on whether RRDtool has received
>> individual updates or a bulk update with identical data. From a comment in
>> rrd_update.c:
>>
>> /* calling the smoothing code here guarantees at most one smoothing
>>   * operation per rrd_update call. Unfortunately, it is possible with bulk
>>   * updates, or a long-delayed update for smoothing to occur off-schedule.
>>   * This really isn't critical except during the burn-in cycles. */
>>
>> This behavior is counter-intuitive and complicates testing.
> 
> yep
> 
>> 3. The smoother takes a running average over 5% of a season. This 5% figure is
>> hard-coded, and has produced quite awful predictions when I've set the season
>> to be a week, since each point becomes an average over about 8 hours.
>>
>> I'm not sure about what to do about #2, but I think #1 should be made more
>> consistent (say, run at a fixed interval), and #3 should perhaps be an input
>> parameter, although I don't know the right interface for it.
>>
>> At IMVU, we've disabled the smoother altogether because of its problems. I
>> wouldn't be averse to ripping out the smoother altogether, but I imagine others
>> find the feature useful.
> 
> this wold mean an extra parameter at RRA setup time ...
> 
> unfortunately I have not used HW in production, so this is all a
> guessing game to me ...
> 
> cheers
> tobi
>