[rrd-users] trying to understand the relationship between source data, what's in rrd and what gets plotted

Tobias Oetiker tobi at oetiker.ch
Sat Jul 21 00:16:02 CEST 2007


Hi Mark,

I will add the following to the rrdcreate manpage ....

=over

=item AVERAGE

the average of the data points is stored.

=item MIN

the smallest of the data points is stored.

=item MAX

the largest of the data points is stored.

=item LAST

the last data points is used.

=back

Note that data aggregation inevitably leads to loss of precision
and information. The trick is to pick the aggregate function such
that the I<interesting> properties of your data is kept across the
aggregation process.



> I'd say my problems arose from the fact that I could find any description on
> what average, min, max and last do!  If they're already somewhere may all
> that's needed is a link, but I didn't see anything.  That said, under create
> manpage under RRA I'd expand the description of the consolidate function and
> also include sections for each of these 4 options in the same way there are
> sections for gauge, counter, derive, absolute adn compute under DS.  For
> example (and only just a suggestion building on what's already there):
>
> "The data is also processed with the consolidation function (/CF/) of the
> archive. When there is more than one data element [better words?] to be stored
> in the same cell [I'm not familiar enough to know what rrd calls these] those
> data elements must be consolidated into1and you must select how that
> consolidation is to be done by selecting one of the follow:
>
> AVERAGE - all the data elements are averaged.
> MIN - the smallest data element is chosen
> LAST - the last data element is used
> MAX - the data element with the maximum value is used
>
> One must also realize this process is not perfect.  It you have a lot of
> samples being consolidated into a single one and there is spike or a very low
> value, they will probably never be seen if you're using average.  On the other
> hand if you have a lot of small values and a single spike being consolidated,
> you could get misleading results if you chose max.
>
> This effect can be even more noticeable when plotting because the default plot
> width of 400 so all data must fit into one of 400 points.  If you have more
> than 400 data elements to plot, you are guaranteed some consolidation will
> occur in this case.  This effect can be reduced by making wider plots but you
> can't escape it."
>
> how's that?
> -mark
>
> Tobias Oetiker wrote:
> > Hi Mark,
> >
> > yes the 'lost' spike confuses people ... most, when they start
> > thinking about it, see that rrdtool does exactly the right thing,
> > it uses to consolidation method of the data being graphed to
> > further consolidate for the graph ...
> >
> > so ifyou are using MAX as consolidation function for the RRA, the
> > grapher will use MAX too. If you are averaging the data, the
> > grapher will use the same function too ...
> >
> > if you have textual suggestions for the grapher documentation I
> > will be glad to include tem
> >
> > thanks
> > tobi
> > Today Mark Seger wrote:
> >
> >
> > > Alex van den Bogaerdt wrote:
> > >
> > > > On Fri, Jul 20, 2007 at 12:31:25PM -0400, Mark Seger wrote:
> > > >
> > > >
> > > > > more experiments and I'm getting closer...  I think the problem is the
> > > > > AVERAGE in my DEF statements of the graphing command.  The only
> > > > > problem is
> > > > > I couldn't find any clear description or examples of how this works.
> > > > > I
> > > > > did try using LAST (even though I have no idea what it does) and my
> > > > > plots
> > > > > got better, but I'm still missing data points and I want to see them
> > > > > all.
> > > > > Again, I have a step size of 1 second so I'd think everything should
> > > > > just
> > > > > be there...
> > > > >
> > > > >
> > > > Last time I looked, which is several moons ago, the graphing part
> > > > would average different samples which needed to be "consolidated"
> > > > due to the fact that one was trying to display more rows than there
> > > > were pixel columns available.
> > > >
> > > >
> > > Ahh yes, I think I see now.  However, and I simply point this out as an
> > > observation, it's never good to throw away or combine data points as you
> > > might
> > > lose something really important.  I don't know how gnuplot does it but
> > > I've
> > > never see it lose anything.  Perhaps when it sees multiple data points it
> > > just
> > > picks the maximum value.  hey - I just tried that and it worked!!!
> > > This may be obvious to everyone else but it sure wasn't to me.  I think
> > > the
> > > documentation could use some beefing up in this place as well as some
> > > examples.  At the very least I'd put in an example that shows a series
> > > that
> > > contains data with a lot of values <100 and a single point of 1000.  Then
> > > explain why you never see the spike! I'll bet a lot of people would be
> > > shocked.  I also wonder how many system managers are missing valuable data
> > > because it's simply getting dropped out off.
> > >
> > > -mark
> > >
> > > > (I wrote consolidated surrounded by quotation marks because it isn't
> > > > really consolidating what's happening)
> > > >
> > > > In other words: unless your graph is 50k pixels wide, you will have
> > > > to select which 400 out of 50k rates you would like to see, or you
> > > > will have to deal with the problem in a different way. For example:
> > > >
> > > > If you setup a MAX and MIN RRA, and you carefully craft their
> > > > parameters,
> > > > you could do something like this:
> > > >
> > > > * Consolidate 60 rates (1 second each) into one (of 60 seconds).
> > > >   This means setting up an RRA with steps-per-row 60.
> > > > * Display 400 x 60 seconds on a graph (or adjust the graph width,
> > > >   together with the amount of CDPs to plot).
> > > > * Do this using (you fill in the blanks):
> > > >     DEF:MyValMin=my.rrd:minrra:...
> > > >     DEF:MyValMax=my.rrd:maxrra:...
> > > >     CDEF:delta=MyValMax,MyValMin,-
> > > >     AREA:MyValMin
> > > >     AREA:delta#FF0000:values:STACK
> > > >   (That first area does not plot anything, and it is not supposed to.
> > > >   The second area displays a line from min to max.)
> > > > * Do the same for 3600 steps per row, and 400x3600 seconds per graph
> > > >
> > > > and so on.  Of course you can adjust the numbers to your liking.
> > > >
> > > > HTH
> > > >
> > > >
> > >
> >
> >
>
>

-- 
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten
http://it.oetiker.ch tobi at oetiker.ch ++41 62 213 9902



More information about the rrd-users mailing list