[rrd-developers] rrdcached chaining (was: r2130 comments)

Tue Sep 28 23:16:59 CEST 2010

The idea of an rrdcached that is able to copy updates on to a second remote rrdcached opens up a whole load of exciting possibilities; most obvious is the ability to set up a hot-standby DR host for your RRD databases, and have a frontend that can switch between the two.

I've written this functionality into a copy of the code here as a trial/proof of concept, and I have it up and running on two machines, forwarding updates received from MRTG via a unix socket on one machine on via tcp to a second.  The way I did it was to

1)      Add an option to rrdcached, -C <address>, to specify where the chained rrdcached is located.

2)      Add some code at the end of handle_request_update() to call rrdc_update with the list of updates we've just processed.  This needs a little extra code to massage the file parameter, which has to be either relative or absolute depending on whether the ongoing socket is TCP or unix domain.

This means that updates are put into the remote queue at the same time as being put into the local queue, rather than the local queue feeding into the remote queue.  I felt this was better because (a) it means there is less latency between update and remote write, and (b) it is much simpler to implement.  The drawback of course is that if the remote daemon is unavailable you can lose data as it is not buffered, but that would be the case if you were talking to it directly.  In this case, I log an error to the syslog which can be handled elsewhere.  If the local update succeeds, but the remote fails, then (since rrdcached does not have a warn status) I am returning an OK (since the local update is OK) but again logging an error to syslog.  This is not ideal but I felt it better than returning a fail - if your remote server is down you don't want local requests to appear to fail as well.

I suppose I should note that, in this case, if you start daemon A chaining to B, and start B chaining to A, and then submit an update, it loops constantly.  Fun and excitement ensues.  But noone would do this and expect it to work, now, would they...  err, well, they might.  It might be worth adding some way for a daemon to detect if this is a forwarded command, and decide if it should be forwarded again or not.

I'd expect a full implementation to relay not just update, but also create, forget, flush and flushall.  Stats should also be recursively implemented; I'd envision something like the values returned by the remote being prefixed with 'Remote' and added to the local output.

I wouldn't expect the replication target to change under normal circumstances, so setting it once on rrdcached startup and requiring a restart to change it wouldn't be a problem, as this would never happen (you'd just set it up and leave it).

Most if not all of the potential problems seem to be related to placing the chain after the queue.  If you place the chain before the queue, and accept the potential (non-silent) data loss if the remote daemon is unavailable, then things are far simpler.  Also, if you decide that the responsibility of the rrdcached is to relay commands only and not to be responsible for ensuring full synchronisation of the RRD files, things are much simpler.  Since the nature of the RRD means that missing data will slowly evaporate I think this is a fair decision to make.

I've not been making too much noise about this because I want to get the rrdcached create/info/last patches solid first, plus the MRTG patches I've been working on to take advantage of this functionality.

Steve

--- code snippets: from rrd_daemon.c, end of handle_request_update
  if (values_num < 1) {
    return send_response(sock, RESP_ERR, "No values updated.\n");
  } else if( copy_daemon ) {
    status = rrdc_connect(copy_daemon);
    status = rrdc_is_connected(copy_daemon);
    if(!status) {
        RRDD_LOG (LOG_ERR, "handle_request_update: could not connect to remote rrdcached: %s",rrd_get_error());
        rrd_clear_error();
        return send_response(sock, RESP_OK,
        "Errors, enqueued %i value(s) but could not connect to remote daemon.\n", values_num);
    }
    /* now, if we are doing chained updates unix->unix or tcp->tcp all will be
       OK as we're preserving the orig_file.  However if we're doing
       tcp->unix we need to use 'file' (IE with the path) and if we're doing
       unix->tcp we need to REMOVE the default path.                         */
    if( ! strncmp( copy_daemon, "unix:", 5 ) || (*copy_daemon == '/') ) {
       /* going to a unix socket: 'file' is already expanded. */
    } else { /* going to a tcp: strip path if necessary */
       file = orig_file;
       if( ! strncmp( file, config_base_dir, _config_base_dir_len ) ) {
           file += _config_base_dir_len + 1; /* skip path and separator */
       }
    }
    status = rrdc_update(file,values_num,(const char * const *) values_arr);
    if(status) {
        RRDD_LOG (LOG_ERR, "handle_request_update: could not perform remote update: %s",rrd_get_error());
        rrd_clear_error();
        return send_response(sock, RESP_OK,
            "Errors, enqueued %i value(s) but could not relay.\n", values_num);
    }
    return send_response(sock, RESP_OK,
                         "Update successful, enqueued and relayed %i value(s).\n", values_num);
  } else {
    return send_response(sock, RESP_OK,
                         "Update successful, enqueued %i value(s).\n", values_num);
  }

--- from read_options
  while ((option = getopt(argc, argv, "Ogl:s:m:P:f:w:z:t:Bb:p:Fj:a:hC:?")) != -1)
  {
    switch (option)
    {
      case 'C':
        copy_daemon = strdup (optarg);
        break;

________________________________
Steve Shipway
ITS Unix Services Design Lead
University of Auckland, New Zealand
Floor 1, 58 Symonds Street, Auckland
Phone: +64 (0)9 3737599 ext 86487
DDI: +64 (0)9 924 6487
Mobile: +64 (0)21 753 189
Email: s.shipway at auckland.ac.nz<mailto:s.shipway at auckland.ac.nz>
P Please consider the environment before printing this e-mail

From: kevin brintnall [mailto:kbrint at rufus.net]
Sent: Wednesday, 29 September 2010 2:11 a.m.
To: Steve Shipway
Cc: rrd-developers at lists.oetiker.ch
Subject: Re: [FORGED] r2130 comments : rrd_parsetime, create_set_no_overwrite

Steve, I'm bringing this back to rrd-developers since there are a lot of possibilities here...

I'm also working on another option for rrdcached, -C <address>, to chain update requests on to a second rrdcached, allowing you to run a hot DR server with constantly updated RRD files.  Will probably have to chain on create, info and stats as well ...

I've given some thought to this too...  There are a couple problems that I can see:

(1) it's currently not possible to change much for rrdcached's running configuration.  So, if the replication target ever changes, we want to avoid a shutdown/start-up sequence.  (On my machine, this can take up to 15 minutes due to extensive journal replay -- which is another issue altogether).

(2) A remote daemon that falls behind may cause the number of values queued to grow without bound.  Currently the daemon discards the cached values as soon as it enqueues the file on the write queue.

(3) If the remote daemon is unreachable for 2 * (flush timer) , then all the journal entries required to re-create the state will have been expired.  On some systems, keeping the journals around will not be a burden.  On others, it will (especially if they contribute to the start-up replay time).

(4) If we do not maintain infinite journals for (3), then we are forced to use a different synchronization technique after we've passed 2*(flush timer).  This probably includes export/import or scp of the RRD files.

To this end, I've been considering a second process that will "tail" the journal files and repeat the flow back to another journal.  Then, this process could be stopped/started independently, reconfigured as necessary, etc.  It could store a minimal amount of state (rrd journal file name, seek position) and be restarted easily.

Optionally, rrdcached could fork()/exec() such a process each time a journal file was close()'d.  This would delay the replay up to (flush timer) seconds or 1GB of journal write, whichever is less.  Also, it would concentrate (in time) the processing load on the remote machine.

With regards to what needs replication, I think it would be sufficient to replicate these: create, update, forget.

Open to suggestion...

--
 kevin brintnall =~ /kbrint at rufus.net/<http://kbrint@rufus.net/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.oetiker.ch/pipermail/rrd-developers/attachments/20100928/aa821eb6/attachment-0001.htm