[rrd-developers] [FORGED] r2130 comments : rrd_parsetime, create_set_no_overwrite

Tue Sep 28 15:11:00 CEST 2010

Steve, I'm bringing this back to rrd-developers since there are a lot of
possibilities here...

I’m also working on another option for rrdcached, -C <address>, to chain
> update requests on to a second rrdcached, allowing you to run a hot DR
> server with constantly updated RRD files.  Will probably have to chain on
> create, info and stats as well ...
>

I've given some thought to this too...  There are a couple problems that I
can see:

(1) it's currently not possible to change much for rrdcached's running
configuration.  So, if the replication target ever changes, we want to avoid
a shutdown/start-up sequence.  (On my machine, this can take up to 15
minutes due to extensive journal replay -- which is another issue
altogether).

(2) A remote daemon that falls behind may cause the number of values queued
to grow without bound.  Currently the daemon discards the cached values as
soon as it enqueues the file on the write queue.

(3) If the remote daemon is unreachable for 2 * (flush timer) , then all the
journal entries required to re-create the state will have been expired.  On
some systems, keeping the journals around will not be a burden.  On others,
it will (especially if they contribute to the start-up replay time).

(4) If we do not maintain infinite journals for (3), then we are forced to
use a different synchronization technique after we've passed 2*(flush
timer).  This probably includes export/import or scp of the RRD files.

To this end, I've been considering a second process that will "tail" the
journal files and repeat the flow back to another journal.  Then, this
process could be stopped/started independently, reconfigured as necessary,
etc.  It could store a minimal amount of state (rrd journal file name, seek
position) and be restarted easily.

Optionally, rrdcached could fork()/exec() such a process each time a journal
file was close()'d.  This would delay the replay up to (flush timer) seconds
or 1GB of journal write, whichever is less.  Also, it would concentrate (in
time) the processing load on the remote machine.

With regards to what needs replication, I think it would be sufficient to
replicate these: create, update, forget.

Open to suggestion...

-- 
 kevin brintnall =~ /kbrint at rufus.net/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.oetiker.ch/pipermail/rrd-developers/attachments/20100928/8353614b/attachment.htm