[rrd-developers] rrd cache work

Tue Sep 2 18:15:14 CEST 2008

On Tue, Sep 02, 2008 at 11:33:45AM +0200, Florian Forster wrote:
> >   rrdc_flush_if_daemon(opt_daemon, filename);
> 
> Sounds good :) I'd rather not define that function in `rrd_client.h'
> though, since that file is made available to the world (i. e. it's
> installed to $prefix/include/), but `rrdc_flush_if_daemon' clearly is an
> internal RRDTool function.. See my notes about `rrdc_is_connected' below
> for a suggestion how to make this functionality generally available to
> programs using the library.

We could do it in a few different ways:

 * forward declarations in the relevant *.c files
 * internal-only include file
 * change rrd_flush to a no-op if no daemon connection

> Oh, actually `N:' *should* work.. If not, the implementation in
> `buffer_add_value' is broken.. It's been some time, but I *think* I
> tested that at some point..

You're right, I missed that code.

> at-style time is definitely broken, though. Is anybody actually using
> that? AfaIk parsing that at-style time is not thread-safe, though, so
> I'd do it just list `rrd_update_r' and document it as not working. That
> way all the `rrdc_*' functions stay thread safe from the beginning.

It doesn't appear that @time is very widely used..  I'd be OK with just
throwing an error..

> > I noticed that some of the other methods that deal with several RRDs
> > (i.e. graph, xport) re-use a single connection.  I am wondering if we
> > should generalize this approach as follows:
>
> I agree, having a ``global'' connection would be a good thing. I'd move
> connecting to the daemon up, though. By that I mean that we introduce a
> function such as
>   rrdc_is_connected
> which is used by all the API functions (update, graph, xport, ...) to
> check whether a connection to the daemon exists or not. Parsing of the
> `--daemon' argument and connecting to the daemon would then be done by
> the `rrdtool' binary instead.

I don't think we can move this functionality to the rrdtool program, since
there are a bunch of other language bindings that need to be able to
handle --daemon argument also.

> Other programs and scripts would need to call `connect' and `disconnect'
> themselves. The environment variable should be interpreted by the
> `rrdtool' binary only, but *not* by the library. If other programs want
> to use the same library to behave in a similar way, they should parse it
> themselves and call `connect' accordingly. To address your Perl related
> remark above: I'd rather export `connect' and `disconnect' to Perl than
> having magic happen.. There's enough magic in Perl already ;)
>
> In my patch I've added the connection stuff to the (non-threadsafe) API
> functions rather than the `rrdtool' command. I think this has been a bad
> choice and I now think the implementation described above would be
> superior by far.
> 
> So, in conclusion, yes, I think having one connection for all your
> caching needs is desirable, but connection handling should be done
> explicitly by the program using the library (which may happen to be the
> `rrdtool' command)..

I rather like the idea of introducing this feature without changes to the
existing API.  I'd like to see us continue in the current direction.  By
forcing changes upstream, we make it harder to adopt this enhancement.

Anyone else have any feeling on the issue?

> > When creating new cache_item_t in handle_request_update,
> > we should skew the time as follows:
> > 
> >   ci->last_flush_time = now + random() % config_flush_interval;
> 
> I see your point, but I think a much more effective way of avoiding IO
> problems is by throttling the speed in which RRD files are written. If
> you set this to, say, 20 updates per second, your system will stay
> responsive and all data will and up on permanent storage eventually.
> `Flush'ed values ignore this speed-limit, of course.. I've implemented
> this for the `rrdtool' plugin in `collectd' and it works like a charm :)

I see your point.  There are problems with the throttling approach on
either extreme.

 * if the rate is too high, it becomes the same problem that we have now

 * if the rate is too low, then client applications may block for a long
   time waiting for flushes, while the hardware is idle (think graph with
   a lot of RRDs).

I'll experiment once I get it up in my environment.

-- 
 kevin brintnall =~ /kbrint at rufus.net/