[rrd-users] trying to understand the relationship between source data, what's in rrd and what gets plotted

Sat Jul 21 00:49:52 CEST 2007

The thing that's interesting about this whole situation is that on one 
level rrd appears to draw a cleaner graph and the gnuplot one looks a 
little fuzzier, but I also think the gnuplot provides valuable 
information that gets lost, and probably missed with rrd.  If my 
examples were disk performance numbers rrd could have led someone to the 
conclusion that everything was running just fine at a load of 20 while 
gnuplot shows there's really a range from 0 to 20 and things are not 
fine.  If you zoom into the rrd data you definitely can see see the 
details of the drop off, but my fear is how many people would bother.  
They would see the day long data and think everything is fine.

But I do have a possible solution and I think you may have hit on it 
when you said gnuplot plots all three values in your example.  In fact I 
regenerated the same plot with gnuplot using unconnected points here 
http://webpages.charter.net/segerm/27818-n1044-20050814-cpu.png and you 
can see multiple points stacked on top of each other.  Might there be 
such a possible plotting option with rrd?  In other words what if there 
was a plot type such that rather than consolidate data you print a bar 
between the low and high values for that single interval?  At the very 
least that would give someone a more accurate picture (if they want it) 
of what is going on.  Naturally I have no clue as to how easy that would 
be or how it would fit into the overall scheme of how rrd does its 
thing, but it could help address a lot of confusion over why the plots 
don't always agree with the data

-mark

Tobias Oetiker wrote:
> Mark,
>
> think about this:
>
> you give rrdtool 3 samples with the following values:
>
>
> 10, 100, 990
>
> now you ask rrdtool to plot the data. And ithappens all three
> values get mapped to 1 pixel.
>
> If you pick AVERAGE the result will be 100
> If you pick MAX it will be 990
> If you pick MIN it will be 10
>
> What gnuplot does, is that it just draws ALL the data values. This
> is why you get these strange, wide color areas ...
>
> In gnuplot you would get the same result for the following to sets
> of input data
>
> 10,10,990
> 10,990,990
>
> in rrdtool the MIN and MAX would be equivalent too, but the AVERAGE
> would be the AVERAGE and not just a wider area.
>
> So now, I wonder why you think the gnuplot approach is accurate ...
>
>
> If you want to show the range of values that went into a plot, I
> suggest, you plot
>
> MAX
> AVERAGE
> MIN
>
> this will give the user a clearere picture of the values
> consolidated.
>
> hope this helps
>
> cheers
> tobi
>
>
>
>
>
> Today Mark Seger wrote:
>
>   
>> ok, I really hate to be a pain but I really want to get this right too and
>> have spent almost 2 solid days trying to understand this and am still puzzled!
>> I think rrd is pretty cool but if it's not the right tool to be using, just
>> tell me.  I believe when doing system monitoring it's absolutely vital to
>> report accurate data.  I also understand some people are satisfied with
>> averages but not me, especially if I'm trying to do benchmarks are accurately
>> measure performance.
>>
>> Now for some details - I've collected data in 10 second samples for an entire
>> day.  I created a rrd database with a step of 10 and capable of holding 8640
>> samples so as I understand it all data should be recorded accurately and I
>> believe it is.  Then I loaded it up with data.  I then examined the data by
>> doing a 'fetch' and verified the correct values are stored.  When I plot the
>> data using MAX, I'm missing a lot of low values and I understand it's because
>> I have 8600 samples and a plot that's only 800+ pixels wide, but what I'd like
>> to try an understand is does anyone know how gnuplot gets it right?  I'm
>> looking at a gnuplot plot of the same data and it shows a much better
>> representation of what is going on with the network.
>> have a look at http://webpages.charter.net/segerm/27728-n1044-20050814-cpu.png
>> which was generated by gnuplot and now look at
>> http://webpages.charter.net/segerm/cpu.png which was generated by rrdtool for
>> just the cpu-wait data.  this rrd plot implies there is a constant wait of
>> about 20% while the other plot clearly shows it fluctuating between almost 0
>> and 20.  What I want to know is how is it that gnuplot can do this and rrdtool
>> can't?  They both have about the same number of pixels to play with so I'm
>> guessing gnuplot is doing some more sophisticated consolidation and whatever
>> that is, I'd like to suggest rrdtool offer that as an additional option.  I
>> should also point out in fact the gnuplot plot is only 640 pixels wide to
>> rrd's 881 so even though rrd has more pixels to play with gnuplot does a
>> better job.
>>
>> please understand I'm only trying to understand what's happening and see if
>> there's a way to improve rrd's accuracy because if people are relying on it to
>> reflect and accurate picture of their environment, I think this is pretty
>> important.
>>
>> anyone else care to comment?
>>
>> -mark
>>
>> Tobias Oetiker wrote:
>>     
>>> Hi Mark,
>>>
>>> yes the 'lost' spike confuses people ... most, when they start
>>> thinking about it, see that rrdtool does exactly the right thing,
>>> it uses to consolidation method of the data being graphed to
>>> further consolidate for the graph ...
>>>
>>> so ifyou are using MAX as consolidation function for the RRA, the
>>> grapher will use MAX too. If you are averaging the data, the
>>> grapher will use the same function too ...
>>>
>>> if you have textual suggestions for the grapher documentation I
>>> will be glad to include tem
>>>
>>> thanks
>>> tobi
>>> Today Mark Seger wrote:
>>>
>>>
>>>       
>>>> Alex van den Bogaerdt wrote:
>>>>
>>>>         
>>>>> On Fri, Jul 20, 2007 at 12:31:25PM -0400, Mark Seger wrote:
>>>>>
>>>>>
>>>>>           
>>>>>> more experiments and I'm getting closer...  I think the problem is the
>>>>>> AVERAGE in my DEF statements of the graphing command.  The only
>>>>>> problem is
>>>>>> I couldn't find any clear description or examples of how this works.
>>>>>> I
>>>>>> did try using LAST (even though I have no idea what it does) and my
>>>>>> plots
>>>>>> got better, but I'm still missing data points and I want to see them
>>>>>> all.
>>>>>> Again, I have a step size of 1 second so I'd think everything should
>>>>>> just
>>>>>> be there...
>>>>>>
>>>>>>
>>>>>>             
>>>>> Last time I looked, which is several moons ago, the graphing part
>>>>> would average different samples which needed to be "consolidated"
>>>>> due to the fact that one was trying to display more rows than there
>>>>> were pixel columns available.
>>>>>
>>>>>
>>>>>           
>>>> Ahh yes, I think I see now.  However, and I simply point this out as an
>>>> observation, it's never good to throw away or combine data points as you
>>>> might
>>>> lose something really important.  I don't know how gnuplot does it but
>>>> I've
>>>> never see it lose anything.  Perhaps when it sees multiple data points it
>>>> just
>>>> picks the maximum value.  hey - I just tried that and it worked!!!
>>>> This may be obvious to everyone else but it sure wasn't to me.  I think
>>>> the
>>>> documentation could use some beefing up in this place as well as some
>>>> examples.  At the very least I'd put in an example that shows a series
>>>> that
>>>> contains data with a lot of values <100 and a single point of 1000.  Then
>>>> explain why you never see the spike! I'll bet a lot of people would be
>>>> shocked.  I also wonder how many system managers are missing valuable data
>>>> because it's simply getting dropped out off.
>>>>
>>>> -mark
>>>>
>>>>         
>>>>> (I wrote consolidated surrounded by quotation marks because it isn't
>>>>> really consolidating what's happening)
>>>>>
>>>>> In other words: unless your graph is 50k pixels wide, you will have
>>>>> to select which 400 out of 50k rates you would like to see, or you
>>>>> will have to deal with the problem in a different way. For example:
>>>>>
>>>>> If you setup a MAX and MIN RRA, and you carefully craft their
>>>>> parameters,
>>>>> you could do something like this:
>>>>>
>>>>> * Consolidate 60 rates (1 second each) into one (of 60 seconds).
>>>>>   This means setting up an RRA with steps-per-row 60.
>>>>> * Display 400 x 60 seconds on a graph (or adjust the graph width,
>>>>>   together with the amount of CDPs to plot).
>>>>> * Do this using (you fill in the blanks):
>>>>>     DEF:MyValMin=my.rrd:minrra:...
>>>>>     DEF:MyValMax=my.rrd:maxrra:...
>>>>>     CDEF:delta=MyValMax,MyValMin,-
>>>>>     AREA:MyValMin
>>>>>     AREA:delta#FF0000:values:STACK
>>>>>   (That first area does not plot anything, and it is not supposed to.
>>>>>   The second area displays a line from min to max.)
>>>>> * Do the same for 3600 steps per row, and 400x3600 seconds per graph
>>>>>
>>>>> and so on.  Of course you can adjust the numbers to your liking.
>>>>>
>>>>> HTH
>>>>>
>>>>>
>>>>>           
>>>       
>>     
>
>