[rrd-users] Using RRD for sparse and generic statistical data

Mon May 28 22:28:45 MEST 2001

Hello, I have just started investigating RRD as a tool for counting
statistics.
These statistics are numbers that represent the occurance of an event at a
particular time
(or within a given restricted time period). Thus we store things like
"user logins occured 10 times this past 5 minutes' or
"user logins occured 0 times this past 5 minutes"
"user logins failed 0 times this past 5 minutes" etc.

The set of stats are ever increasing - there are new stats added every time
a programmer
programs a new function.
So if the system now supports "proxy logins" then we would add a new stat
to represent this. The stats is added "on the fly" and the corresponding RRD
file
is setup (via a script) on-the-fly. Thus when a programmer adds a new stat,
a new RRD is
created for it automagically.

I want to use RRD to store data for many many statistics.
The stats are generated by code within various servers.
One key element is that a stat (for example, lets call it
"login.failed") is not generated at regular time intervals.
Instead its generated (in this example) whenever a login fails.
This can  be many times a minute or once a day.
The resolution required will be only down to hours (for past week)
and about every day for all time periods that are over 1 week old.

There will be many many generated stats. The detailedness (resolution down
to an hour)
should be approx. the same for all stats.

Question 1: For many of the RRDs, the "step" will not be adhered to.
If I set the step to 30 minutes, I may "update" date into an RRD
only every 30 mins or once a day. What I want is for RRD to take
the case of "no data at a specified time interval" to mean 0 not unknown (I
think).

For example: a login fails 2 times in the past 30 minutes. We send a
timestamp and the
value "2" to RRD.
Then no more login failures occur for a day - so we dont send any data to
the RRD.
Then the next day 1 failure occurs so we send the timestamp and a value of
1.
Thus RRd misses many data points for many of the steps during that day.

The graph for this (overa 48 hour period for example) should show the number
of
login failures as 2 during that one period, 1 for the period of the "next
day"
and 0 for all other periods in the graph.

This type of 'sparse' data will be occuring a lot in our set of RRDs.

How can I do this? What is the best way?

Question 2: I have many servers running. Each on its own machine.
Each can generate 'stats' that are the same name (and the same period).
For example, there are 3 login servers and anyone can generate a
"login.failure"
stat at the same time. And each will send its data to an RRD independently.
Thus the RRD may receive 3 sets of data for the same RRD and the same time
period.

Is this OK? I.e., will RRd accumulate values for the same time period
or will it overwrite them?

How can I make it accumulate values (withing RRD)?

Any help is appreciated!

Mike Papper
mike at bodaro.com

--
Unsubscribe mailto:rrd-users-request at list.ee.ethz.ch?subject=unsubscribe
Help        mailto:rrd-users-request at list.ee.ethz.ch?subject=help
Archive     http://www.ee.ethz.ch/~slist/rrd-users
WebAdmin    http://www.ee.ethz.ch/~slist/lsg2.cgi