[rrd-developers] Re: How to get the most performance when using lots of RRD files

Sun Aug 20 20:41:50 MEST 2006

Martin Sperl wrote:
> Hi!
> I remember having made similar observations some time ago, so I have 
> already written a SQL backend to RRD (look for a libdbi patch) and it 
> works quite fine for us with more than 60000 data sources added every 5 
> minutes resulting in currently 100M rows of data in the format (time 
> stamp,data source-id,value). There is much less IO overhead 
.... depending on effectiveness of caches and DB backend. That you
should keep in mind.
> involved 
> adding data to the database than with adding to an rrd file - also you 
> can separate this to different machines easily...
>
> The performance observation I have made is that with this huge table 
> graphing one data source takes some time to fetch the data initially 
> (the index has to be read,...) and then graphing again works very fast. 
> But this is naturally correlated to OS and DB caching and all this is 
> correlated to the memory-size of the server... So you will have the 
> memory side of the problem anyway.
>   
In-memory caching of the full RRDB (or the most accessed parts, for that
matter) would only yield equivalent results (probably faster in fact,
due to the simpler indexing / absence of indexing)
> Also with mysql there is a table locking issue: No row-level locking for 
> myisam type tables and using InnoDB gives a performance penalty and 
> increased size for data storage. But the way around this is to have 2 
> (or more) tables:
> one (short-term) for entering data and a second one for "historic-read 
> only" data, to which data needs to be moved regularly to keep the short 
> term table small. This also allows to use different table types for each 
> of these  tables (InnoDB for short term and MyISAM for longterm).
>   
.... having a single writer is anothe r solution, and avoids copying
(unless consolidation can be made really inteligently)
> Regarding keeping Min,value,Max in the table in one row I believe that 
> this will introduce more disk-space overhead than it is worth. 
Most probably
> [snip]
>
> Also the SQL backend is written in such a way that you can use almost 
> any kind of Table-structure, that seems to fit your personally preferred 
> data structure.
>   
... in order to use the right tool for the right job, yes.
> [snip]
>
> For storing values in the database it is IMHO also much more efficient 
> not to call rrdtool for storing the data, but adding the data directly 
> to the database from your script,
If done properly, the overhead of calling rrdtool is completely negligible
> as you will normally always have some 
> additional data that needs to be fetched from the database anyway.
... which is "constant" mostly and can be cached.
> I assume that cacti is always doing something like this...
>   
Yes, since it is implemented as a PHP script run from CRON which then
fetches some data from the DB and uses the CLI version of rrdtool.... :-(
> Ciao,
>           Martin
>
> P.s: [snip] The SQL patch also includes a mode for predicting future 
> data together with a sigma. This is used for one of our applications to 
> show if the current web traffic is "within normal bounds of 
> operation"... (This does not need a special setup like the Holst-Winter 
> forecasting!)

This really IS interesting. Looking forward to reading it
>  For this there is also the idea to use FFT or 
> sine-least-square-fits to get other kinds of prediction instead of the 
> current "shift and average" mode of operation, which works very well...
>   
Might be a very nice addition too.... please post it when you feel it's
ready :-)


    J.L.

--
Unsubscribe mailto:rrd-developers-request at list.ee.ethz.ch?subject=unsubscribe
Help        mailto:rrd-developers-request at list.ee.ethz.ch?subject=help
Archive     http://lists.ee.ethz.ch/rrd-developers
WebAdmin    http://lists.ee.ethz.ch/lsg2.cgi