[rrd-developers] Re: How to get the most performance when using lots of RRD files
Richard A Steenbergen
ras at e-gerbil.net
Thu Aug 17 22:00:44 MEST 2006
On Wed, Aug 16, 2006 at 04:30:03PM +0200, Tobias Oetiker wrote:
>
> Richard,
>
> a balanced performance (read AND wrtie) without the need to
> periodiacally clean up the mess, was the prime idea behind the
> rrdtool datastructure. so if you know the holy grail on how to do
> this 'better' I would be most interested ...
Understood, but the tradeoff is that while the current design is very
simple and effective for lightweight use, it is very inefficient for large
scale use.
The current design of rrdtool is based around scripts calling tools which
do a transaction using a single .rrd file, and then quit. Can you imagine
the performance of an SQL database where, when you wanted to do an update,
you exec'd a shell that called /usr/local/bin/sqlcli mydbfile "UPDATE
ifInOctets SET data=1234 WHERE timestamp=blah", which then blocked while
it went out, opened the file, did some seeks, made some updates, and
closed the file? Sure it would probably "work" just fine if you had a
multi-GHz machine doing a few dozen calls every few minutes, but it would
suck horrifically if you actually had to do real work with it.
Note that I'm not suggesting we all run out and start moving our graphing
DBs to SQL, but the necessary architecture to scale to large data sets is
abundantly clear thanks to all those people who spend lots of time and
energy developing databases. Rather than try to reinvent the wheel
(especially in areas which are not core competencies) by duplicating a
database server/client model for RRD, there are some very clear benefits
to decoupling the database and graphing components. For some people full
SQL DBs may make sense, for others a more lightweight alternative might be
useful, etc. The solution here is to modularize, modularize, modularize.
BTW If you do want to implement an SQL based backend for RRD, please do a
better job of it than say RTG. They make the same mistake RRD does today,
using the table name to index data rather than actually letting the
database do its job, so each transaction with an ifindex actually hits two
seperate files (one for in, one for out :P).
> also ideas for the architecture of a 'cool' api (for passing a long and
> flexible list of arguments into rrd functions) would be interesting ...
The C API shouldn't be cool or flexible, it should be simple and
efficient. It is the job of the frontend to be cool and flexible, and then
make simple low-level calls to an API to do the actual dirty work.
Instead of calling a C lib with rrd_update(argc, argv), you should just be
passing in the data you need to do an update, which is a timestamp, a DS,
and a value. Something like:
struct rrd_update_param {
struct timeval timestamp;
long datasource;
rrd_value_t value;
};
It is the job of the frontend tool to parse all the different forms of
time you want to support (N: value: value@ etc), to parse any templates
you want to use, etc.
A proper C API should have functions like rrd_open(), rrd_close(), you
should be able to query information from it (such as obtaining the DS list
if necessary, etc) using something like rrd_info(), you should be able to
make calls (potentially multiple calls, as needed) to rrd_update(), etc.
The actual rrd_update() call, you should be passing in an array of struct
rrd_update_param using a ptr and a length value. There should be little or
no text parsing inside the C API, that it for the frontend tool to do or
not do. An update should be an atomic transaction, and if it can't do it
it should return an error for the frontend to handle.
The funny thing is that most of these functions and functioality are in
there, they're just in the wrong places to be useful to anyone writing C
programs. All you really need to do is move things around a little bit so
you have layers of abstraction which make sense.
> for the return values I do have somethig I like in rrd_info .... I
> am thinking about integrating that acros the board for a future
> version ...
Haven't seen it.
> if anyone wants to sponsor work in this direction, talk to me, I'll
> be glad to make an offer.
My recommendation would be that you work with someone who really
understands good C style and high level design. The issue with the current
code isn't so much the functionality as it is the organization, and you
could really god a lot more people involved in maintaining the codebase
with some cleanup to make it more logical.
Unfortunately I'm involved in about a billion projects right now, so I'm
not sure how much time I can actually devote to this. I would have
submitted some modifications for the update code already, but I'm still
using a 1.2rc2 CVS snapshot from May 2005 for my base code. Something
happened to make the pretty pictures look significantly less pretty in
every version I've tried afterwards, and I honestly can't figure out what
(not a graphics person :P).
--
Richard A Steenbergen <ras at e-gerbil.net> http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
--
Unsubscribe mailto:rrd-developers-request at list.ee.ethz.ch?subject=unsubscribe
Help mailto:rrd-developers-request at list.ee.ethz.ch?subject=help
Archive http://lists.ee.ethz.ch/rrd-developers
WebAdmin http://lists.ee.ethz.ch/lsg2.cgi
More information about the rrd-developers
mailing list