[rrd-developers] implementing portable format

Mon Nov 3 03:39:44 CET 2008

On Sun, Nov 02, 2008 at 08:42:31AM +0100, Tobias Oetiker wrote:
> > * UPDATEV support?  It would require the daemon to keep a copy of the RRD
> >   header of each file in memory, and perform the same calculations that
> >   will ultimately be performed on the real file.
> 
> hmm, you will effectively reimplement the disk-cache ... to be able
> todo the calculations you will have to keep everything except for
> the RRA data space in memory ... you would have to change the rrd_update code
> accordingly.

We'd have to de-couple the calculation of new values and the writing of
those values to the file..  I believe there is already some separation of
those two inside rrd_update_r.

> to me this looks more like a 1.5 feature (if at all) maybe with the
> modified data access structure in 1.5, this could be implemented
> much easier by hiding the header cache capability inside the
> rrd_open code.

Agreed.  Let's leave it for later.

> > * expose BATCH mode to API/bindings?
> >   (for my setup I wrote my own protocol speaker and talk directly to the
> >   daemon for high update rate)
> >
> >   + enables higher update rates
> >   - client won't get return codes
> >     (i.e. you can't know a FLUSH has completed while you're still in BATCH
> >     mode).
> 
> yes this certainly makes sense, does rrd_update with multiple
> arguments use the batch mode ?

RRD update with multiple arguments just passes one large line through to
the daemon.

  rrdtool update x.rrd V1 V2 V3
  rrdtool update y.rrd V4 V5

... causes protocol

  C: UPDATE x.rrd V1 V2 V3
  S: 0 Enqueued 3 updates
  C: UPDATE y.rrd V4 V5
  S: 0 Enqueued 2 updates

This means that each RRD requires one small read() and one small write()
on both client and server.  This causes a lot of sys calls...  that's what
batch mode is designed to get around.

BATCH mode delays the status code until the very end.  It is not a
property of a single update to a file, but a way to initiate a series of
updates.  With "BATCH" mode, the protocol looks like:

  C: BATCH
  S: 0 Go ahead.  End with dot '.' on its own line.
  C: UPDATE x.rrd V1 V2 V3
  C: UPDATE y.rrd V4 V5
  C: UPDATE /nofile V6
  C: ... as many updates as needed
  C: .
  S: 1 errors
  S: 3 No such file: /nofile

Now, the client may fill each write() with as many commands as possible.
It's quite likely that all of the updates above would be sent in a single
write().

Because the results are not returned until the very end, it's not possible
to return a meaningful result code from rrd_update()..  We'd have to fake
the results (i.e. always succeed) or modify the API to return the result
code asyncrhonously (and keep a lot of state).

The only approach I can see with the existing API is to always return
success when the client is in batch mode.

I think the right approach is:

 * global state in rrd_client.c to indicate whether we're in BATCH mode
   (maintained in the same places as sd and sh)
 * response_read() fakes success code if(batch_mode)
 * request() avoid fflush() if (batch_mode)
 * rrdc_batch_{stop,start} to start/stop batch mode in client

Then, we could expose rrdc_batch_start() and rrdc_batch_stop() via the C
API and various bindings..

How does that sound?

-- 
 kevin brintnall =~ /kbrint at rufus.net/