[rrd-users] trying to understand the relationship between source data, what's in rrd and what gets plotted

Fri Jul 20 23:30:56 CEST 2007

ok, I really hate to be a pain but I really want to get this right too 
and have spent almost 2 solid days trying to understand this and am 
still puzzled!  I think rrd is pretty cool but if it's not the right 
tool to be using, just tell me.  I believe when doing system monitoring 
it's absolutely vital to report accurate data.  I also understand some 
people are satisfied with averages but not me, especially if I'm trying 
to do benchmarks are accurately measure performance.

Now for some details - I've collected data in 10 second samples for an 
entire day.  I created a rrd database with a step of 10 and capable of 
holding 8640 samples so as I understand it all data should be recorded 
accurately and I believe it is.  Then I loaded it up with data.  I then 
examined the data by doing a 'fetch' and verified the correct values are 
stored.  When I plot the data using MAX, I'm missing a lot of low values 
and I understand it's because I have 8600 samples and a plot that's only 
800+ pixels wide, but what I'd like to try an understand is does anyone 
know how gnuplot gets it right?  I'm looking at a gnuplot plot of the 
same data and it shows a much better representation of what is going on 
with the network. 

have a look at 
http://webpages.charter.net/segerm/27728-n1044-20050814-cpu.png which 
was generated by gnuplot and now look at 
http://webpages.charter.net/segerm/cpu.png which was generated by 
rrdtool for just the cpu-wait data.  this rrd plot implies there is a 
constant wait of about 20% while the other plot clearly shows it 
fluctuating between almost 0 and 20.  What I want to know is how is it 
that gnuplot can do this and rrdtool can't?  They both have about the 
same number of pixels to play with so I'm guessing gnuplot is doing some 
more sophisticated consolidation and whatever that is, I'd like to 
suggest rrdtool offer that as an additional option.  I should also point 
out in fact the gnuplot plot is only 640 pixels wide to rrd's 881 so 
even though rrd has more pixels to play with gnuplot does a better job.

please understand I'm only trying to understand what's happening and see 
if there's a way to improve rrd's accuracy because if people are relying 
on it to reflect and accurate picture of their environment, I think this 
is pretty important.

anyone else care to comment?

-mark

Tobias Oetiker wrote:
> Hi Mark,
>
> yes the 'lost' spike confuses people ... most, when they start
> thinking about it, see that rrdtool does exactly the right thing,
> it uses to consolidation method of the data being graphed to
> further consolidate for the graph ...
>
> so ifyou are using MAX as consolidation function for the RRA, the
> grapher will use MAX too. If you are averaging the data, the
> grapher will use the same function too ...
>
> if you have textual suggestions for the grapher documentation I
> will be glad to include tem
>
> thanks
> tobi
> Today Mark Seger wrote:
>
>   
>> Alex van den Bogaerdt wrote:
>>     
>>> On Fri, Jul 20, 2007 at 12:31:25PM -0400, Mark Seger wrote:
>>>
>>>       
>>>> more experiments and I'm getting closer...  I think the problem is the
>>>> AVERAGE in my DEF statements of the graphing command.  The only problem is
>>>> I couldn't find any clear description or examples of how this works.  I
>>>> did try using LAST (even though I have no idea what it does) and my plots
>>>> got better, but I'm still missing data points and I want to see them all.
>>>> Again, I have a step size of 1 second so I'd think everything should just
>>>> be there...
>>>>
>>>>         
>>> Last time I looked, which is several moons ago, the graphing part
>>> would average different samples which needed to be "consolidated"
>>> due to the fact that one was trying to display more rows than there
>>> were pixel columns available.
>>>
>>>       
>> Ahh yes, I think I see now.  However, and I simply point this out as an
>> observation, it's never good to throw away or combine data points as you might
>> lose something really important.  I don't know how gnuplot does it but I've
>> never see it lose anything.  Perhaps when it sees multiple data points it just
>> picks the maximum value.  hey - I just tried that and it worked!!!
>> This may be obvious to everyone else but it sure wasn't to me.  I think the
>> documentation could use some beefing up in this place as well as some
>> examples.  At the very least I'd put in an example that shows a series that
>> contains data with a lot of values <100 and a single point of 1000.  Then
>> explain why you never see the spike! I'll bet a lot of people would be
>> shocked.  I also wonder how many system managers are missing valuable data
>> because it's simply getting dropped out off.
>>
>> -mark
>>     
>>> (I wrote consolidated surrounded by quotation marks because it isn't
>>> really consolidating what's happening)
>>>
>>> In other words: unless your graph is 50k pixels wide, you will have
>>> to select which 400 out of 50k rates you would like to see, or you
>>> will have to deal with the problem in a different way. For example:
>>>
>>> If you setup a MAX and MIN RRA, and you carefully craft their parameters,
>>> you could do something like this:
>>>
>>> * Consolidate 60 rates (1 second each) into one (of 60 seconds).
>>>   This means setting up an RRA with steps-per-row 60.
>>> * Display 400 x 60 seconds on a graph (or adjust the graph width,
>>>   together with the amount of CDPs to plot).
>>> * Do this using (you fill in the blanks):
>>>     DEF:MyValMin=my.rrd:minrra:...
>>>     DEF:MyValMax=my.rrd:maxrra:...
>>>     CDEF:delta=MyValMax,MyValMin,-
>>>     AREA:MyValMin
>>>     AREA:delta#FF0000:values:STACK
>>>   (That first area does not plot anything, and it is not supposed to.
>>>   The second area displays a line from min to max.)
>>> * Do the same for 3600 steps per row, and 400x3600 seconds per graph
>>>
>>> and so on.  Of course you can adjust the numbers to your liking.
>>>
>>> HTH
>>>
>>>       
>>     
>
>