[rrd-users] Re: About logging the REAL valuesn

Philip Molter philip at datafoundry.net
Wed Aug 23 04:33:57 MEST 2000


On Wed, Aug 23, 2000 at 11:00:20AM +1000, George Dau wrote:
: 
: My issue is that I have a hard time explaining to managers that 0.4 of an
: error count is a valid item. They do not believe me, and I am still
: unconvinced too. I really do want the number of errors since the last
: sample, not some averaged, munged, figure. Tobi is right, I have the source
: code so I can change it to do what I want.

Yes, I have the source code, too, and I could make the changes I
want, but this is why we have mailing lists, to discuss such ideas.
I'm sure a lot more people would love to have access to these sorts
of changes as well as have input on how exactly they should work,
and keeping the source code as one cohesive lump sure would be
nice.

Personally, I don't see why RRD /cannot/ store data at the times
entered rather than at specific intervals.  I mean, if the issue
of maintaining an evenly interpolated data set is one of utmost
importance for something, move the interpolation out into the
retrieval routines (fetch, graph, etc.) where it's much easier to
alter data in an temporary way.  Furthermore, doing so means that
you can have run-time options which decide whether to grab data
the exact way or in the averaged way.  /Both/ ways are useful in
the majority of environments in which they are used, but because
of the way that the data is stored, you're limited to using just
one of them.

I'm sure Tobi can give a much clearer analysis of the problem,
though.  Moving the interpolation code out into the retrieval
routines may severely impact the performance of RRD operations.
So where's the trade-off?

If you're in an environment where you're polling a large number of
machines every minute or two and storing vast amounts of data (such
as an ISP could very well do), not only do you have the incredible
amount of IO activity that comes with storing the data, but you
also could have large amounts of CPU utilization as the interpolation
for all these data entries takes place (I don't know how efficient
these interpolation routines are).  In that same environment, your
data graphing is /generally/ going to be things like day graphs
and last-five-minute status checks.  Thus, the interpolation is
going to be very short, and on a very limited supply of data.  Any
performance decrease will probably be negligible compared to what
you save during the gathering phase.  I say, probably, though.
Tobi would know more about this than I as right now, I'm just an
enduser.

Contrast this against a scenario where you have relatively few data
sets, but you're doing a lot of data processing.  You're probably
very happy with the current scenario.

I can say with certainty, though, that the ability to store precise
data at exact times, even if it's in addition to current features
(ie. an addon, not a rewrite/rethinking) can only help RRD's
usefulness.

* Philip Molter
* Data Foundry International
* http://www.datafoundry.net/
* philip at datafoundry.net

--
Unsubscribe mailto:rrd-users-request at list.ee.ethz.ch?subject=unsubscribe
Help        mailto:rrd-users-request at list.ee.ethz.ch?subject=help
Archive     http://www.ee.ethz.ch/~slist/rrd-users
WebAdmin    http://www.ee.ethz.ch/~slist/lsg2.cgi



More information about the rrd-users mailing list