[rrd-users] trying to understand the relationship between source data, what's in rrd and what gets plotted

Fri Jul 20 22:20:28 CEST 2007

I'd say my problems arose from the fact that I could find any 
description on what average, min, max and last do!  If they're already 
somewhere may all that's needed is a link, but I didn't see anything.  
That said, under create manpage under RRA I'd expand the description of 
the consolidate function and also include sections for each of these 4 
options in the same way there are sections for gauge, counter, derive, 
absolute adn compute under DS.  For example (and only just a suggestion 
building on what's already there):

"The data is also processed with the consolidation function (/CF/) of 
the archive. When there is more than one data element [better words?] to 
be stored in the same cell [I'm not familiar enough to know what rrd 
calls these] those data elements must be consolidated into1and you must 
select how that consolidation is to be done by selecting one of the follow:

AVERAGE - all the data elements are averaged.
MIN - the smallest data element is chosen
LAST - the last data element is used
MAX - the data element with the maximum value is used

One must also realize this process is not perfect.  It you have a lot of 
samples being consolidated into a single one and there is spike or a 
very low value, they will probably never be seen if you're using 
average.  On the other hand if you have a lot of small values and a 
single spike being consolidated, you could get misleading results if you 
chose max.

This effect can be even more noticeable when plotting because the 
default plot width of 400 so all data must fit into one of 400 points.  
If you have more than 400 data elements to plot, you are guaranteed some 
consolidation will occur in this case.  This effect can be reduced by 
making wider plots but you can't escape it."

how's that?
-mark

Tobias Oetiker wrote:
> Hi Mark,
>
> yes the 'lost' spike confuses people ... most, when they start
> thinking about it, see that rrdtool does exactly the right thing,
> it uses to consolidation method of the data being graphed to
> further consolidate for the graph ...
>
> so ifyou are using MAX as consolidation function for the RRA, the
> grapher will use MAX too. If you are averaging the data, the
> grapher will use the same function too ...
>
> if you have textual suggestions for the grapher documentation I
> will be glad to include tem
>
> thanks
> tobi
> Today Mark Seger wrote:
>
>   
>> Alex van den Bogaerdt wrote:
>>     
>>> On Fri, Jul 20, 2007 at 12:31:25PM -0400, Mark Seger wrote:
>>>
>>>       
>>>> more experiments and I'm getting closer...  I think the problem is the
>>>> AVERAGE in my DEF statements of the graphing command.  The only problem is
>>>> I couldn't find any clear description or examples of how this works.  I
>>>> did try using LAST (even though I have no idea what it does) and my plots
>>>> got better, but I'm still missing data points and I want to see them all.
>>>> Again, I have a step size of 1 second so I'd think everything should just
>>>> be there...
>>>>
>>>>         
>>> Last time I looked, which is several moons ago, the graphing part
>>> would average different samples which needed to be "consolidated"
>>> due to the fact that one was trying to display more rows than there
>>> were pixel columns available.
>>>
>>>       
>> Ahh yes, I think I see now.  However, and I simply point this out as an
>> observation, it's never good to throw away or combine data points as you might
>> lose something really important.  I don't know how gnuplot does it but I've
>> never see it lose anything.  Perhaps when it sees multiple data points it just
>> picks the maximum value.  hey - I just tried that and it worked!!!
>> This may be obvious to everyone else but it sure wasn't to me.  I think the
>> documentation could use some beefing up in this place as well as some
>> examples.  At the very least I'd put in an example that shows a series that
>> contains data with a lot of values <100 and a single point of 1000.  Then
>> explain why you never see the spike! I'll bet a lot of people would be
>> shocked.  I also wonder how many system managers are missing valuable data
>> because it's simply getting dropped out off.
>>
>> -mark
>>     
>>> (I wrote consolidated surrounded by quotation marks because it isn't
>>> really consolidating what's happening)
>>>
>>> In other words: unless your graph is 50k pixels wide, you will have
>>> to select which 400 out of 50k rates you would like to see, or you
>>> will have to deal with the problem in a different way. For example:
>>>
>>> If you setup a MAX and MIN RRA, and you carefully craft their parameters,
>>> you could do something like this:
>>>
>>> * Consolidate 60 rates (1 second each) into one (of 60 seconds).
>>>   This means setting up an RRA with steps-per-row 60.
>>> * Display 400 x 60 seconds on a graph (or adjust the graph width,
>>>   together with the amount of CDPs to plot).
>>> * Do this using (you fill in the blanks):
>>>     DEF:MyValMin=my.rrd:minrra:...
>>>     DEF:MyValMax=my.rrd:maxrra:...
>>>     CDEF:delta=MyValMax,MyValMin,-
>>>     AREA:MyValMin
>>>     AREA:delta#FF0000:values:STACK
>>>   (That first area does not plot anything, and it is not supposed to.
>>>   The second area displays a line from min to max.)
>>> * Do the same for 3600 steps per row, and 400x3600 seconds per graph
>>>
>>> and so on.  Of course you can adjust the numbers to your liking.
>>>
>>> HTH
>>>
>>>       
>>     
>
>