[rrd-developers] Update ex post?

Sebastian Harl sh at tokkee.org
Tue Aug 24 12:44:56 CEST 2010


On Mon, Aug 23, 2010 at 05:28:03PM -0700, Thorsten von Eicken wrote:
>   On 8/23/2010 2:13 PM, Sebastian Harl wrote:
> > So, I don't see any reasonable way to solve that with the current
> > architecture (and I don't see any way how this should be improved).
> I believe there is disagreement about what the problem is. You define 
> the problem as "insert values in the past in a manner that is 
> indistinguishable from having inserted them at the correct moment in the 
> first place". Your analysis is correct for that problem.
> In real life, the problem tends to be a different one, which is "insert 
> values in the past in a manner that improves on the current form of the 
> data, i.e., that makes the resulting RRD more useful than without these 
> insertions." Put differently, I prefer some inaccuracy in the data over 
> having data gaps or garbage data in my RRD.

Yep, that's what I realized after writing most parts of my E-mail and
why I've then added "well, unless you accept rather vague approxima-
tions" ;-)

Anyway, in that case, the problem is how to (automatically) detect *how*
to make the RRD more useful. However, that responsibility could be given
to the user. RRDtool could provide a few different mechanism how to
solve it and the user gets to chose which one to use. Something like the
following comes to my mind off the top of my head:

 * Use the new value for all CDPs affected by the change. I assume that
   this would be the most commonly used approach and should be a
   reasonable approach to fixing spikes in the data/graph or remove
   undefined CDPs.

 * Use the average (min, max, $some_more_complex_function, ...) of the
   CDP before and after the one that's affected and ignore the specified
   new value (or use the new value as a parameter to the function as
   well). This might also be a reasonable approach to fix spikes and
   undefined values but can also be used for more powerful stuff.

   And while we're at it, we could also think about using more than two
   surrounding CDPs to calculate the updated CDP and, e.g., do some kind
   of function fitting. I don't think there's any actual need for that
   but it would be a cool feature and possibly fun to implement ;-)

 * Assume that the (possibly preprocessed) value of the CDP is the value
   of all PDPs that were used to create the CDP and then apply something
   like the "replace" function I was talking about in my other E-mail to
   update the CDP. I'm not entirely sure there would be a use-case for
   that, though, but it shouldn't be too hard to implement ;-)

What other real-world use-cases, other than removing spikes or undefined
values, are out there? Those are the only ones I've encountered so far.

> What many of us have built are tools that get at the second problem. It 
> would be great if the RRD maintainers could accept some of these tools 
> in the spirit of "sometimes the perfect is the enemy of the good".

So, what's the approach those tools are using (I've never looked at any
of those [nor have I looked *for* any ;-)])?


Sebastian "tokkee" Harl +++ GnuPG-ID: 0x8501C7FC +++ http://tokkee.org/

Those who would give up Essential Liberty to purchase a little Temporary
Safety, deserve neither Liberty nor Safety.         -- Benjamin Franklin

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
Url : http://lists.oetiker.ch/pipermail/rrd-developers/attachments/20100824/48b96d2d/attachment.pgp 

More information about the rrd-developers mailing list