[rrd-users] trying to understand the relationship between source data, what's in rrd and what gets plotted

Mark Seger Mark.Seger at hp.com
Tue Jul 24 21:29:31 CEST 2007

I tried what you suggested by using a min/max RRA in addition to storing 
all the data in a 3rd RRA and it appears to work as you suggested it 
would, but some of the peaks aren't as high as I expected and so I did 
some further digging to try and understand what I'm seeing.  I have 
samples that were taken every 10 seconds starting at 16 seconds past the 
minute.  On my command to create the rrd database, I have:

--start 1123992076  --end 1124076920

noting that the sampling time drifted 4 seconds over the course of the 
day, something I've since fixed in my data collector and so it now 
collects samples without any drift and on multiples of an even minute so 
this shouldn't be an issue in the future.

In any event, when I look at the contents of the rrd database that 
contains a day's worth of 10 second samples using fetch. starting at 
1123992076  I see the first interval at 1123992080, which leads me to a 
couple of questions:
- does rrd choose to normalize data to the nearest minute boundary and 
therefore I get timestamps of 1:00, 1:10, 1:20 even if I enter data as 
1:06, 1:16, etc?  keep in mind my start time for the create DOES land on 
a 6 second bounday
- assuming it does pick its own intervals, is that why none of the 
numbers stored in rrd match the source data even though I have one row 
per sample, because it's normalizing the data?
- if that's the case, I suppose that would explain why I'm not seeing 
the right numbers in my plots.

As I said in an earlier mail message, I think rrd is real good at what 
it does and I'm only trying to understand the areas in which someone is 
trying to do something it really wasn't intended for and this sounds 
like it may be on of those areas.


Alex van den Bogaerdt wrote:
> On Sun, Jul 22, 2007 at 07:23:44AM -0400, Mark Seger wrote:
>> I think this makes a lot of sense and I'm certainly willing to give it a 
>> shot and see what the results look like as you can never be completely 
>> sure until you try it.  So the next obvious question becomes how much 
>> work is it to do something like this in and more important how much 
>> overhead will this add?  From what I've already seen with preliminary 
>> testing is that rrdgraph is already very fast and so I'm hoping it 
>> should be so too.
>> Does this involve storing both min and max values?  I just did a quick 
>> test and found to store about 80 10 sec samples requires about 20MB for 
>> a day's worth of data on 1 system and I normally retain a week since as 
>> we all know sometimes system problems go undetected for several days and 
>> you need to retain that level of detail.
> Best effort basis, from memory and without testing.
> Step size one second, you want a week which is 7 days of 86400 seconds
> each, so you need to retain 7 * 86400 = 604800 rows in the RRA with
> the best resolution.  This RRA can be min,avg,max or last. It doesn't
> matter if you don't use consolidation.  Use AVERAGE because then you
> know for sure RRDtool will not select the wrong RRA later on if you
> happen to give slightly wrong inputs at graph time.
> Then you should also have a couple of RRAs at a lower resolution. These
> RRAs will contain your MIN and MAX values. You will need to think about
> the perfect resolution to look at this data, and set its parameters
> accordingly.  Suggestion: 432 pixels and 200 seconds per pixel column
> (to display a day at a time) or 1400 seconds per pixel (to look at a
> week).  You can use the same RRA to display a week or a day, no problem.
> This means:
> 1 RRA, AVERAGE, 1 step per row, at least 604800 rows
> 1 RRA, MIN, 200 steps per row, at least 3024 rows
> 1 RRA, MAX, 200 steps per row, at least 3024 rows
> (200*3024=604800; also a full week)
> Display script 1, showing the extremes:
> rrdtool graph pic1.png --start end-7d -w 432 \
> 	DEF:min=my.rrd:ds0:MIN \
> 	DEF:max=my.rrd:ds0:MAX \
> 	CDEF:dif=max,min,- \
> 	AREA:min \
> 	AREA:dif#000000::STACK
> Display script 2, showing high resolution data:
> rrdtool graph pic2.png --start end-432sec -w 432 \
> 	DEF:ds0=my.rrd:ds0:AVERAGE \
> 	LINE1:ds0#000000
> Both examples can and probably should be expanded, for instance
> setting an end time, using other colors better matching your
> capabilities, changing width and/or height, and so on.
> Please understand that the 200-seconds-per-row RRA will only show
> finished data, meaning it will not show the most recent seconds
> (upto 199 seconds).

More information about the rrd-users mailing list