[rrd-developers] Accelerator Daemon

Sun Jun 29 23:21:10 CEST 2008

Hi Florian,

On Tue, Jun 24, 2008 at 10:13:29PM +0200, Florian Forster wrote:
> I've invested some more time in the daemon, now called `rrdcached'. You
> can get a patch against the latest SVN trunk here:
> <http://verplant.org/temp/rrdtool-trunk-rrd_update_with_cache.patch>
> 
> The current status is:
> - `rrdcached' understands the `update' and `flush' commands. The daemon
>   is nowhere near production ready, though: There is no PID file, the
>   base directory `/tmp' is hard coded and I've generally done only
>   little testing.
> - `rrdtool update' understands the `--daemon' option. If given, updates
>   are NOT stored to disk but sent to the daemon instead.[1]
> - `rrdtool fetch' understands the `--daemon' option, too. I've changed
>   `rrd_fetch_fn' to take an additional `use_rrdcached' argument. If
>   given, the daemon is told to flush the file before it is read.[2]
> - `rrdtool graph' takes the `--daemon' option, too. If given, it tells
>   `rrd_fetch_fn' to flush the values.
> 
> I'd be grateful if some people (the more the better) could review the
> patch and give me some feedback.

> -- 
> Florian octo Forster
> Hacker in training
> GnuPG: 0x91523C3D
> http://verplant.org/

Looks like you did a good job implementing this update buffering patch
and you sure did implement that quickly.  I would like to give some
feedback based on a review of the above patch and my own experience
with rrd.  

In summary, it looks like this architecture might not really address
the key io problem I have seen with rrd_update which would maybe be
called the 'thundering herd' problem.

When a large number of rrd updates occur at the same time the
filesystem tries to write all updates to disk file at the same time.
This slows the system.  I may be missing something but it looks like
that will happen with this patch if a client is both writing and
reading large numbers of rrd files simultaneously.  This patch does
lazy rrd_updates and would probably perform well for a system where
fetch is called rarely but might not work so well if a client is
graphing data from all the rrd files after a series of updates.  In
that case it looks like flush will be called on most of the cache
which will call rrd update on most of the files.

Maybe an architecture could cache the 'hot' rrd disk blocks in
application memory instead of the filesystem cache.  Rrd update would
update the blocks in the application cache. Rrd data fetch calls would
first look for rrd blocks in this application cache, and with a miss
read from disk using the filesystem.  An rrdcache thread or some
external daemon would be continually writing the updated hot blocks to
disk at a controlled rate - direct io might be useful here. The goal
of this architecture would be to remove os filesystem behavior from
rrd_update.

Thanks,

Scott B