<div>Steve, I'm bringing this back to rrd-developers since there are a lot of possibilities here...</div><div><br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div lang="EN-NZ" link="blue" vlink="purple"><div><p class="MsoNormal"><span class="Apple-style-span" style="font-size: 15px; color: rgb(31, 73, 125); ">I’m also working on another option for rrdcached, -C
<address>, to chain update requests on to a second rrdcached, allowing
you to run a hot DR server with constantly updated RRD files. Will
probably have to chain on create, info and stats as well ...</span></p></div></div></blockquote><div><br></div><div>I've given some thought to this too... There are a couple problems that I can see:</div><div><br></div>
<div>(1) it's currently not possible to change much for rrdcached's running configuration. So, if the replication target ever changes, we want to avoid a shutdown/start-up sequence. (On my machine, this can take up to 15 minutes due to extensive journal replay -- which is another issue altogether).</div>
<div><br></div><div>(2) A remote daemon that falls behind may cause the number of values queued to grow without bound. Currently the daemon discards the cached values as soon as it enqueues the file on the write queue.</div>
<div><br></div><div>(3) If the remote daemon is unreachable for 2 * (flush timer) , then all the journal entries required to re-create the state will have been expired. On some systems, keeping the journals around will not be a burden. On others, it will (especially if they contribute to the start-up replay time).</div>
<div><br></div><div>(4) If we do not maintain infinite journals for (3), then we are forced to use a different synchronization technique after we've passed 2*(flush timer). This probably includes export/import or scp of the RRD files.</div>
<div><br></div><div>To this end, I've been considering a second process that will "tail" the journal files and repeat the flow back to another journal. Then, this process could be stopped/started independently, reconfigured as necessary, etc. It could store a minimal amount of state (rrd journal file name, seek position) and be restarted easily.</div>
<div><br></div><div>Optionally, rrdcached could fork()/exec() such a process each time a journal file was close()'d. This would delay the replay up to (flush timer) seconds or 1GB of journal write, whichever is less. Also, it would concentrate (in time) the processing load on the remote machine.</div>
<div><br></div><div>With regards to what needs replication, I think it would be sufficient to replicate these: create, update, forget.</div><div><br></div><div>Open to suggestion...</div><div><br></div><div>-- </div></div>
kevin brintnall =~ /<a href="http://kbrint@rufus.net/">kbrint@rufus.net/</a><br><br>
</div>