[rrd-users] Questions about different RRA`s
Simon Hobson
linux at thehobsons.co.uk
Thu Apr 5 08:49:12 CEST 2007
David Schulz wrote:
>rrdtool create process_count.rrd --step 60 DS:pc:GAUGE:120:0:U
>RRA:AVERAGE:0.5:1:525960, which is supposed to take one value every
>60 seconds, for one year,
>Can someone show me how to best create one such .rrd with multiple
>RRA`s, explain to me the calculations you made to decide on the
>values for the RRA`s?
>but it appears it is better to split the different required times
>into different RRA`s. I have no idea how to calculate this though. My
>main Problem is to determine the exact Values for the individual
>RRA:AVERAGE:0.5:?:?.
You need to decide three things :
What is the highest resolution data you need ?
and for each graph period,
How long do want to graph for ?
What resolution do you want to graph ?
There is no simple formula for answering these questions as it all
depends on the requirements of your application. The problem with
your first approach is that it stores a LOT of data that you will
never use, and takes a LOT of effort to create each graph - the whole
point of the rra database is that it automates the reduction of data
over time so as to balance storage vs processing.
Lets work one possibility through. You have already decided that you
want per-minute values for your most detailed graph, so your step
size is 60. You probably do not want to graph at that level of
details for long, so lets say 2 days max. 2 days is 2 * 24 * 60 * 60
seconds, and dividing by the initial step size of 60 we end up with 2
* 24 * 60 = 2880 samples required - so RRA:AVERAGE:0.5:1:2880
Next graph is a weekly one, so lets say 14 days max. You really
aren't going to draw a graph with as much detail over that timescale,
so lets say 30 minute samples. Each slot in the database is now going
to be derived from 30 main values, so that is your first number
required. 14 days is 14 * 24 * 60 * 60 seconds, divided by 30 * 60
seconds we get 14 * 24 * 60 / 30 = 672. So our RRA definition is
RRA:<function>:x:30:672. I vaguely recall that x is the permitted
number of missing samples, 0.5 is half the samples missing before the
value is declared to be unknown.
So, RRA:AVERAGE:0.5:30:672 means keep an average, with 30 primary
samples to each derived sample, and keeping 672 values in the table.
You are now in a position to graph with a resolution of 1 minute
going back up to 2 days, or with 30 minute resolution going back up
to 14 days.
Just apply the same process to your other timescales.
One other thing, you MUST store what you want to graph. So, if you
think you may want a graph showing MAX cpu load as well as average
then you must also calculate and store an rra using the MAX function.
You cannot infer the values from an average rra. If you have a very
'peaky' load then this is something you may want to do.
> I have read that for example this
>
>RRA:AVERAGE:0.5:1:60 should be one hour (1*60*60= 3600 seconds), and
>RRA:AVERAGE:0.5:60:24 should be 24 hours, but how about a week or a
>month for example? Why RRA:AVERAGE:0.5:60:24, and not RRA:AVERAGE:
>0.5:24:60, both should equal a week, right?
By now you should realise that the difference is in the available
resolution for graphing - 24 minutes vs 60 minutes per value.
This is worth the effort to get right from the beginning. If you
change your mind you are faced with exporting, mangling, and
importing your data to create a new rra - or simply throwing away
your data and starting again. Once you have the data being collected,
you can spend as long as you like dealing with the graphs.
More information about the rrd-users
mailing list