[rrd-users] rrdtool - features ... 2nd y-axis

Sat May 31 23:44:52 CEST 2008

Tobias Oetiker wrote:
>>> what is it that you try to achieve ? with integers for data storage ?
> 
>> So my wishlist would be:
>> * add the ability to store any common data value (8,16,32,64bit int
>>   and single/double floats) - to use the best match for the purpose.
>>
>> I fully understand that this doesn't strike you as a very rewarding
>> venture - I'm just thinking about the feasability to do it myself.
> 
> hmmm ok ... I guess the fundamental problem we are striking here,
> is the 'get back exactly what I put in' bit. This is contrary to
> rrdtools philosophy of enabeling you do put in data when ever you
> manage to sample it, and then adjusting for the time jitter. This
> does not work with integers ...

Answering Tobi's and Alex' post in one because it would be redundant
otherwise.

Tobi, ok, I assume this is the same problem as almost always:
You happen to find a great tool that does almost everything you want
but it seems impossible to get the last missing bit ...

So please don't get me wrong, this is not meant to be any critism on
the tool - I'm just trying to understand why things are like they are
and if it might make sense to put any effort in changing it.

> now, floats vs doubles, this might be another matter, but integers
> will just cause so much trouble since the results will be different
> from what people expect them to be ...

this is what happened to me, since I can't get back exactly what I put
in ...

> that I fear an explosion of unhappy users who managed to totally mess
> up their data by storing it as integers ...

well, if there would be a cmdline-switch for rrdcreate to select the
data-type or - even better - the possibility to select the data type
for each DS (with a default behavior of double) I don't see that problem.

for the graphing you would need floating point *calculation* anyway, but
I can't see why it would be a problem to *store* the values as integers.
Ok, there would be some loss with consolidation, but when I decide to
use consolitation I accept (and expect) manipulation of the data. But
at least for the data stored in the first RRA it would be acurate.

Especially if you plan to use the rrd not only for graphing but also for
logging, storing and archiving the data (over a long period of time), it
might be important to
a) store exact data values and
b) use as little as possible disk-space

perhaps that's the point where that's the point where I'm stuck with
rrdtools (for this project) and I might have to look for something
different - however, rrdtool is doing such a great job in the ease of use
(especially when graphing) that I'm sure that any other tool wouldn't be
that good ...

this is why I thought it might be worth thinking about adding the missing
functionality to it instead of thinking about programming (what I need)
completely from scratch.

I guess it wouldn't be *that* hard to do the rrd part with different data
types, however, the graphing with all the CDEF and VDEF and RPN will be
tough ... that's why I wonder if it might be possible/feasable to make
the required changes in the database part and keep the graphing ...

Alex van den Bogaerdt wrote:
> On the other hand, having multiple choices in the code, at various
> places in the code, will have a negative impact on CPU utilization
> and processing time.
>
> For example (and yes, this is a simplified example):
>
> a:  read 400 rates -> read 8*400 bytes
>     show 400 doubles -> 400 times FP computation to get y-axis value
>
> b:  read 400 rates ->
>       if datatype==nibble
>       then read 400/2 bytes
>       else if datatype=byte
>       then  read 400 bytes
>       else if datatype==longint
>       then read 4*400 bytes
>       else if datatype==double
>       then read 8*400 bytes
>       else read 16*400 bytes
>     show 400 doubles ->
>       if datatype==nibble 400 times find proper halfbyte and do int computation
>       else if (datatype==byte) or (datatype==longint) 400 times int computation
>       else 400 times FP computation
>
> Yes, you save a few disk blocks... but I think I still prefer (a) over (b).

Alex, well, first you know as well as I do that it isn't that complicated:

   read 400 rates ->
     read 400 * sizeof(DS_datatype)

since determining the size of the datatype needs to be done only once for all
the 400 values, the computation overhead is next to nothing. However, if storing
*many* values for a *long* period of time, the saved amount of disk space could
be immense - apart from the point 'get back exactly what I put in'

There is nothing wrong (except for wasting some CPU cycles) in doing all the
*processing* in the biggest representation of int and float, so for processing
it only needs a 'wrapper' function between database retrieve (let's say nibble)
and processing in 64bit int, same thing for float -> double.
Just for storing I hate to waste many bytes for values that always stay between
0 and 10 (for example) or even worse 0 and 1.
For any type of consolidation and graphing it might be ok to always convert
everything to double since modification of data is expected when consolidation
happens. So basically only the store and retrieve functions would have to support
various data types of 1,4,8,16,32,64bit int and 32 and 64bit float - for all other
processing the wrapper to the store/retrieve interface could do the conversion
from any to double back and forth.

@Tobias:
can you tell me if it's possible to use such a wrapper function? That means there
should be (at best) only one single point in the code that does store & retrieve?

Or am I totally wrong with my thoughts?

If you're willing to continue discussing that option, I'm also happy to  take it
off-list, since it might be boring for many others here ...

Tobias & Alex, many thanks for your time and your comments.

cheers

- Karl