[rrd-developers] Unexpected behaviour of PREDICT and PREDICTSIGMA

Martin Sperl rrdtool at martin.sperl.org
Sat Apr 26 12:12:53 CEST 2014


Yes it is (somewhat) intended, because you would possibly also want to take the "current" value and its window (so the last w seconds) into account for the estimation as well.
Obviously this selection has more of an impact on sigma calculation than the prediction itself - especially when the current value is "outside" of expected behavior...

That is why I have provided the long format of defining the offsets of your choice as absolute offsets (which also allows you to give some days - say 7 days ago - higher weight than others...)

The negative option was there more of a short cut for the 0, -1*s, -2*s, -n*s which was my focus.

But as for your "intuitive" argument - I would at least want to include -7*s when I say n=-7 not only -6*s days, so this is an off by 1 bug...
But, I still would include the "current" window as part of the sample - especially for the sigma part (but if you do it, then you need to do it for both the same way...)

You still can implement the behavior of your choice using the explicit approach: 
CDEF:y=-604800,-604800,-518400,-432000,-345600,-259200,-172800,-86400,8,1800,x,PREDICT 
(here with the same day a week ago with a higher weight by including it twice, and no influence from the value of "now")

Ciao, Martin

P.s: For our practical implementations we have switched to the explicit approach (as it allows you to give higher weights to the same day a week ago...)
So from our use-case we are not really affected by any change in this code...

As I think about it - under some circumstances a PREDICT_MEDIAN would also be a good thing to use instead of calculating the average with PREDICT.

Obviously MEDIAN and SIGMA do not mix well, but that way a "one-off" (say an issue one day last week) has less influence over the prediction than the PREDICT_AVERAGE version.
But instead it favours older data for any increasing/decreasing trends so it more likley over/underestimates - especially if you do predictions with data where your values increase/decrease by a factor of 2 in 6 month with PREDICTION of 6 steps of say 30 days and a window of 7 days.

To get some upper/lower boundry graphs and alerting based on those, then the equivalent to PREDICT+X*SIGMA,PREDICT,PREDICT-X*SIGMA would be the calculation of "percentiles" - so PREDICT_PERCENTILE(75),PREDICT_MEDIAN,PREDICT_PERCENTILE(25).
(note that PREDICT_MEDIAN = PREDICT_PERCENTILE(50), which would make the code easier, but maybe with PREDICT_MEDIAN as an alias/short...)

Note that the required sorting of values for MEDIAN/PERCENTILE is even more CPU consuming than the calculation of averages...
On 26.04.2014, at 11:02, Steve Shipway <s.shipway at auckland.ac.nz> wrote:

> So, assuming stptr is zero based, this means my `analysis of the behaviour of negative shift counts is correct.
> 
> My question, though, is more to ask if this was intended by design, or if it is a 'feature'?
> 
> Intuitively, I would have expected this:
> 
> CDEF:y=s,-n,x,PREDICT
> 
> to result in shifts of s, 2s, 3s, .... ns
> 
> However, it actually results in shifts of 0, s, 2s, ... (n-1)s
> 
> Either way, it can't be changed now as it would potentially alter existing behaviour... 
> 
> Steve
> 
> Steve Shipway
> University of Auckland ITS
> UNIX Systems Design Lead
> s.shipway at auckland.ac.nz
> Ph: +64 9 373 7599 ext 86487
> 
> 
> _______________________________________________
> rrd-developers mailing list
> rrd-developers at lists.oetiker.ch
> https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers



More information about the rrd-developers mailing list