[rrd-users] Re: Fup: About logging the REAL value

Wed Aug 23 22:43:22 MEST 2000

Today you sent me mail regarding [rrd-users] Re: Fup: About logging the...:

*> (Note that my discussion below is aware of the fact the data is OBSERVED
*> data collected at specific points in time and that the value being
*> observed might change freely between data collections.  The entire
*> discussion below is about how to treat that specific data in some better
*> ways).
*> 
*> I don't have much problem with interpolation of values to get AVERAGE
*> values to write at a specific time.  In the "number of used modems"
*> case, you can tell the "customer" of the MRTG pages that the "current"
*> value of 15.253 modems being shows up because the time being "current"
*> (say 12:05) isn't exactly the same time when the server was queried for
*> modem data (which happened at perhaps 12:04 or 12:06).  (Actually, if
*> MRTG/RRDtool doesn't use the word "current" but "Latest" the users
*> perhaps get's less confused.)

you might even use CDEF in rrdtool graph and make the LAST value an
integer ... no harm is done because the 'real' data is save in
rrdtool

*> So doing interpolation of data to get 5 min samples is fine for me as
*> long as it is intended only for AVERAGE calculations and not MAX
*> calculations.

I see ... we are not done yet ... there is no such thing as a MAX
in a 5 minute interval. If 5 minutes is your resolution you can not
see more than the 5 minute  averages ... else you are not doing 5
minute intervals ... if you can see into the interval then you are
doing a different resolution lets say 30 seconds ... you thus have
to configure the rrd database acordingly and you will get the
higher resolution ...

*> What I actually do have a problem with is the fact that there the
*> "maximum observed" values for the modems results in values like 24.45,
*> which is wrong!  All the SNMP queries for the number of used modems
*> return integer values.  Thus the maximum value among these values is an
*> integer value too.  And that value is the most accurate "Maximum
*> observed" value IMHO.

if we agree that you have to make your observation every 5 minutes
(sharp) then you will never have 24.45 in the database ... on the
other hand, if you miss to take the sample at the specific time and
end up with a sample taken a minute later, then there is no nowing
exactly what the max observed value at the 5 minute interval would
have been. rrdtool makes a best effort by integrating the data you
gave it and builds a value which at least over the the long run
does not distort the data. Now, if the data you are monitoring
fluctuates that massively that it can have any value from 0 to max
at any point in time than it is random data and monitoring makes no
sense ... alse if the data fluctuates a little less it might still
be neccessary to monitor more frequently than every 5 minutes ... 

The max observed value thing is only interesting if you can
observer constantly ... and then you can as well enter the value
with a fake time because what you do is you say I have now had a
look at the data over the course of the last 5 minutes and the max
I saw was 7 ... there is no point in entering this kind of data
before the observation interval has ended ... 

*> I think this OMAX/OMIN feature should be implemented and put into the
*> RRDtool distribution because:
*> 
*> 1.) I believe (or rather guess) it is not hard to implement nor messes
*> up the code badly nor breaks all the fundamentals of the Round Robin
*> Database (But I doubt I know what I talk about here as I've not READ the
*> code :-).

it would not be difficult

*> 2.) I don't consider adding these two DSs to complicate the user
*> interface to rrdtool very much.

it would not

*> 3.) It does not require any post processing of the data once these OMAX,
*> OMIN DS are written into the file.

correct

*> 4.) It does not mess with the data.  It just picks another interpolation
*> strategy which in some circumstances provide more accurate data to the
*> end users.

correct ... and it will further confuse the users who seem to have
a hard time grasping the concept as is stands ...

*> End result will be that RRD files still have all data sampled to the
*> fixed time intervals but some of them really have the min and max
*> observed values.

the new thing will be that there will be two kinds of data in
rrdtool, data which has been propperly preprocessed and averaged to
a defined interval ... and random data ... observed by chance ...

*> Tobi, if I write the code for what I suggest above (and do that with
*> cleaner code than I provided in my earlier attempts to contribute ;-)
*> would you mind including it?

yes ... (not because of your code but because of the feature ...)

I launched this discussion to see if there was any convincing
argument for having the pre-averaging modified or enhanced ... I
have not read one such argument ... 

cheers
tobi

PS: still waiting for that super duper image cutting code .. :-)

-- 
 ______    __   _
/_  __/_  / /  (_) Oetiker, Timelord & SysMgr @ EE-Dept ETH-Zurich
 / // _ \/ _ \/ / TEL: +41(0)1-6325286  FAX:...1517  ICQ: 10419518 
/_/ \.__/_.__/_/ oetiker at ee.ethz.ch http://ee-staff.ethz.ch/~oetiker

--
Unsubscribe mailto:rrd-users-request at list.ee.ethz.ch?subject=unsubscribe
Help        mailto:rrd-users-request at list.ee.ethz.ch?subject=help
Archive     http://www.ee.ethz.ch/~slist/rrd-users
WebAdmin    http://www.ee.ethz.ch/~slist/lsg2.cgi