[rrd-developers] Re: How to get the most performance when using lots of RRD files
José Luis Tallón
jltallon at adv-solutions.net
Sun Aug 20 20:41:50 MEST 2006
Martin Sperl wrote:
> I remember having made similar observations some time ago, so I have
> already written a SQL backend to RRD (look for a libdbi patch) and it
> works quite fine for us with more than 60000 data sources added every 5
> minutes resulting in currently 100M rows of data in the format (time
> stamp,data source-id,value). There is much less IO overhead
.... depending on effectiveness of caches and DB backend. That you
should keep in mind.
> adding data to the database than with adding to an rrd file - also you
> can separate this to different machines easily...
> The performance observation I have made is that with this huge table
> graphing one data source takes some time to fetch the data initially
> (the index has to be read,...) and then graphing again works very fast.
> But this is naturally correlated to OS and DB caching and all this is
> correlated to the memory-size of the server... So you will have the
> memory side of the problem anyway.
In-memory caching of the full RRDB (or the most accessed parts, for that
matter) would only yield equivalent results (probably faster in fact,
due to the simpler indexing / absence of indexing)
> Also with mysql there is a table locking issue: No row-level locking for
> myisam type tables and using InnoDB gives a performance penalty and
> increased size for data storage. But the way around this is to have 2
> (or more) tables:
> one (short-term) for entering data and a second one for "historic-read
> only" data, to which data needs to be moved regularly to keep the short
> term table small. This also allows to use different table types for each
> of these tables (InnoDB for short term and MyISAM for longterm).
.... having a single writer is anothe r solution, and avoids copying
(unless consolidation can be made really inteligently)
> Regarding keeping Min,value,Max in the table in one row I believe that
> this will introduce more disk-space overhead than it is worth.
> Also the SQL backend is written in such a way that you can use almost
> any kind of Table-structure, that seems to fit your personally preferred
> data structure.
... in order to use the right tool for the right job, yes.
> For storing values in the database it is IMHO also much more efficient
> not to call rrdtool for storing the data, but adding the data directly
> to the database from your script,
If done properly, the overhead of calling rrdtool is completely negligible
> as you will normally always have some
> additional data that needs to be fetched from the database anyway.
... which is "constant" mostly and can be cached.
> I assume that cacti is always doing something like this...
Yes, since it is implemented as a PHP script run from CRON which then
fetches some data from the DB and uses the CLI version of rrdtool.... :-(
> P.s: [snip] The SQL patch also includes a mode for predicting future
> data together with a sigma. This is used for one of our applications to
> show if the current web traffic is "within normal bounds of
> operation"... (This does not need a special setup like the Holst-Winter
This really IS interesting. Looking forward to reading it
> For this there is also the idea to use FFT or
> sine-least-square-fits to get other kinds of prediction instead of the
> current "shift and average" mode of operation, which works very well...
Might be a very nice addition too.... please post it when you feel it's
Unsubscribe mailto:rrd-developers-request at list.ee.ethz.ch?subject=unsubscribe
Help mailto:rrd-developers-request at list.ee.ethz.ch?subject=help
More information about the rrd-developers