[rrd-developers] Re: How to get the most performance when using lots of RRD files
Richard A Steenbergen
ras at e-gerbil.net
Sun Aug 20 00:25:34 MEST 2006
On Fri, Aug 18, 2006 at 07:05:58AM +0200, Tobias Oetiker wrote:
> Hi Richard,
>
> > The current design of rrdtool is based around scripts calling tools which
> > do a transaction using a single .rrd file, and then quit.
>
> if you have lots of data I guess you would NOT use the cli but
> rather the perl module ... but besides this ....
Which then calls the CLI, yes? Using the perl module is one way to manage
complexity, writing your own interface to call rrd functions is another.
Perl is not a good solution for every problem. :)
> > Note that I'm not suggesting we all run out and start moving our graphing
> > DBs to SQL, but the necessary architecture to scale to large data sets is
> > abundantly clear thanks to all those people who spend lots of time and
> > energy developing databases.
>
> Have you actually run tests with databases on this ? are they
> faster when you update hundreds of thousands of diferent 'data
Are intelligent buffered writes to a structured db multiplexed by a
persistent server process more efficient than starting a new process which
blocks while it does open, lock, write, close, and exit, for every
transaction? Absolutely.
> * ds table
> ds-id, name, type, min, max
>
> * data table
> ds-id, timestamp, value
Pretty much. There are advantages to having a persistent poller here too,
so you can cache the ds id's and just fire off a batch of updates every
time your poller cycle hits without needing to query ds status. Same thing
for handling counters or absolutes if you want to store data as native
rates, you'd want to minimize db transactions, though you could also
accomplish this with db server-side functions.
> or would you create a diferent table for each 'datasource' ?
This is a Bad Idea (tm), and one of the fundamental mistakes that RTG
makes. Using table names to index data is not what relational databases
were meant to do, and takes you right back to the same problem you have
today. :)
> well the rrd_update example is nice, but how would you go for
> something like rrd_create, or rrd_graph ?
Well rrd_update() is probably the most important in terms of reducing
overhead and needing a good C API, so the fact that it is the simpliest is
a bonus feature. :) But as far as other functions go... Really all you
need to do is look to how you implement these things yourself, and then
organize it so that users can do the same.
For example, lets take graphing... What are the logical steps involved?
You need to load data from an rrd file, then you need to define any
necessary cdef expressions using that data, and the elements you want to
graph, and then you render that graph based on a bunch of parameters (some
of which are required and fixed into the API, some of which are optional,
and some of which are just behavioral flags).
Here is an example that I just pulled out of my ass. I don't know enough
about the internal implementation of rrdtool to say if this is exactly how
it should be done or not, so there may be plenty of modifications or
optimizations available, but this is an "artists concept" of what use of a
proper C API might look like (error checking omitted for simplicity of
course :P):
struct RRD_DB *rrd_db;
struct RRD_DEF *rrd_def;
struct RRD_CDEF *rrd_cdef;
struct RRD_GRAPH *rrd_graph;
time_t start, end;
struct RRD_GRAPH_CFG rrd_graph_cfg[] = {
{ RRD_GRAPH_CFG_TITLE, "Some title" },
{ RRD_GRAPH_CFG_FONT, "/somepath/somefont.ttf" },
etc etc
};
rrd_db = rrd_open("/somepath/somefile.rrd");
rrd_def = rrd_def_load(rrd_db, "ifInOctets", RRD_CF_AVERAGE);
rrd_cdef = rrd_cdef_create("%s,8,*", rrd_def);
rrd_graph = rrd_graph_new(640, 480, rrd_graph_cfg);
rrd_graph_config_flags(rrd_graph, RRD_CONFIG_FLAGS_LAZY | RRD_CONFIG_FLAGS_RIGID);
rrd_graph_element_add(rrd_graph, RRD_GRAPH_ELEMENT_LINE1, rrd_def, "#777777", "Legend");
rrd_graph_element_add(rrd_graph, RRD_GRAPH_ELEMENT_AREA, rrd_cdef, "#00aabb", "Blah");
start = rrd_time_expression("-1h");
end = rrd_time_expression("now");
rrd_graph_render_file("/somepath/somefile.png", RRD_GRAPH_TYPE_PNG, start, end);
start = rrd_time_expression("-24h");
rrd_graph_render_file("/somepath/somefile2.png", RRD_GRAPH_TYPE_PNG, start, end);
Graphing the same thing but across different timeranges seems like a
pretty common operation to me (more so than reuse of pretty much anything
else), so more than likely you'd want to optimize for that case. I would
think that you'd probably want the "dynamic calculation" elements like
cdefs and vdefs to stay symbolic representations of what to do with the
real data from defs all the way until you do the render, so you only need
to do the calculations on the specific datapoints you're graphing and not
the entire DS.
> > Unfortunately I'm involved in about a billion projects right now
> > [...]
>
> there you go .. and so it ends ... most of the time
Well by which I mean I don't have the free time to completely rewrite this
myself, but I can certainly do my part to help. :)
--
Richard A Steenbergen <ras at e-gerbil.net> http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
--
Unsubscribe mailto:rrd-developers-request at list.ee.ethz.ch?subject=unsubscribe
Help mailto:rrd-developers-request at list.ee.ethz.ch?subject=help
Archive http://lists.ee.ethz.ch/rrd-developers
WebAdmin http://lists.ee.ethz.ch/lsg2.cgi
More information about the rrd-developers
mailing list