[rrd-developers] implementing portable format

Sun Nov 2 12:42:13 CET 2008

Hi Kevin and Tobi,

On Sat, Nov 01, 2008 at 11:56:17PM -0500, kevin brintnall wrote:
> On Sat, Nov 01, 2008 at 09:33:46AM +0100, Tobias Oetiker wrote:
> > > Maybe we can start to tidy up for 1.4 now (while the storage code is
> > > relatively stable) and target the storage changes (portable format,
> > > different backends) for 1.5?
> > 
> > yes you are right, this might be sensible ... release often,
> > release early ...

Sounds good to me. Since we've only got new features and (iirc) no
backward-incompatible changes, updates should be pretty straight
forward.

> > what work do you see for the cached ?
> 
> Here's what I'm currently pondering..  I'm interested in your thoughts on
> the new features...
> 
> -----------------------------------------------------------------
> 
> NEW FEATURES:
> 
> * UPDATEV support?  It would require the daemon to keep a copy of the RRD
>   header of each file in memory, and perform the same calculations that
>   will ultimately be performed on the real file.
> 
>   - higher memory utilization

Do you know any numbers for that? I suppose that this would have a
fairly large impact on large setups. I'd prefer to be able to use the
available memory to be able to cache real data instead of being able to
use UPDATEV. So, if this should be implemented, imho it should be made
optional. I'm not sure though if that's worth the effort though as that
would presumably add quite some complexity.

> * expose BATCH mode to API/bindings?
>   (for my setup I wrote my own protocol speaker and talk directly to the
>   daemon for high update rate)
> 
>   + enables higher update rates
>   - client won't get return codes

(Disclaimer: I did not really follow that, so I might repeat stuff or
just talk plain bullshit - if so, please tell me ;-))

I suppose, the goal is to be able to run a series of commands in a row
with a high update rate but _not_ to be able to continuously run
commands forever.

Just some random ideas how to implement this (as part of the API and
presumably the daemon / underlying protocol as well):

  * For each command that should be supported in BATCH mode, add an
    appropriate BATCH_* command.

  * Add a new command BATCH_FINISH (or something like that) that tells
    the daemon that the current job is finished. Now, the client will
    block until all commands are done and the return code / values are
    available.

  * Now, we need some way to get those return codes / values. I see two
    ways how to do that:

    - BATCH_* returns a handle identifying the command. BATCH_FINISH
      then returns the appropriate handle followed by the return values
      of the command. This happens for one command at a time in random
      order (presumably the order in which the commands have been
      finished). The daemon needs to cache those return values until
      they can be sent to the client. A special handle might be used to
      indicate the end of return values.

    - BATCH_* does not return anything. The command handles are used
      internally only. BATCH_FINISH will continuously output the return
      codes / values in the same order they have been started.

    I prefer the former solution. It might add some complexity but it
    should be much more powerful and flexible.

  * It might make sense to add an option to BATCH_* indicating that the
    return value is not interesting. In that case, BATCH_FINISH will not
    care about that command and simply skip it. If all commands have
    been marked as "uninteresting", BATCH_FINISH will return
    immediately using status 0.

This way, it's not necessary to change the basic format of the protocol
(i.e. status line "<num> <random text>", where <num> < 0 indicates an
error, else the number of subsequent lines being returned). The only
exception would be BATCH_FINISH which does not return the total number
of returned lines but returns multiple status lines + appropriate return
values. Imho, this would still be fairly consistent and straight
forward though.

Did I miss anything? Any comments or further ideas?

An example session might look like this:

 -> BATCH_UPDATE ...
 <- 1 batch command dispatched successfully
 <- 1234
 -> BATCH_FLUSH -uninteresting ...
 <- 1 batch command dispatched successfully
 <- 3456
 -> BATCH_PENDING ...
 <- 1 batch command dispatched successfully
 <- 5678
 ...

... where 1234, 3456 and 5678 are the returned command handles. If we
don't export the handle to the user, the return value would look like:

 <- 0 batch command dispatched successfully

If, we're done:

 -> BATCH_FINISH
 <- 3 command PENDING finished
 <- 5678
 <- <pending update 1>
 <- <pending update 2>

 <- 1 command UPDATE finished
 <- 1234

 <- 1 all batch commands finished
 <- 0

Or (returning two status lines for each command, one returning the
handle and the other one being the usual output of the command):

 -> BATCH_FINISH
 <- 1 command PENDING finished
 <- 5678
 <- 2 pending updates
 <- <pending update 1>
 <- <pending update 2>

 <- 1 command UPDATE finished
 <- 1234
 <- 0 success

 <- 1 all batch commands finished
 <- 0

(Empty lines separating the output for the different commands are used
for clarification only and are not part of the protocol.)

The "0" on the last line is the special "batch finished" handle
indicating that no further results are available. Instead of that, a
line like "0 all batch commands finished" might be used to terminate the
output.

What do you think?

.oO( Darn, this got much longer than expected ... ;-))

Cheers,
Sebastian

-- 
Sebastian "tokkee" Harl +++ GnuPG-ID: 0x8501C7FC +++ http://tokkee.org/

Those who would give up Essential Liberty to purchase a little Temporary
Safety, deserve neither Liberty nor Safety.         -- Benjamin Franklin

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.oetiker.ch/pipermail/rrd-developers/attachments/20081102/9fbe1038/attachment.bin