[rrd-developers] rrd cache work

Tue Sep 2 11:33:45 CEST 2008

Hi Kevin,

On Mon, Sep 01, 2008 at 04:09:42AM -0500, kevin brintnall wrote:
> I am working on a few changes...  Once I have completed testing I will
> submit the patch back to you and tobi.

great :)

>   rrdc_flush_if_daemon(opt_daemon, filename);

Sounds good :) I'd rather not define that function in `rrd_client.h'
though, since that file is made available to the world (i. e. it's
installed to $prefix/include/), but `rrdc_flush_if_daemon' clearly is an
internal RRDTool function.. See my notes about `rrdc_is_connected' below
for a suggestion how to make this functionality generally available to
programs using the library.

> Also, it appears that the update strings are passed directly through
> the daemon without modification.  However, when we see an update like
> "N:1:3:5:7:9", the time that ultimately gets passed through to
> _rrd_update should be based on when the "N:" was received at the
> daemon, not when it was flushed out to disk.  Likewise, @-time is
> broken.

Oh, actually `N:' *should* work.. If not, the implementation in
`buffer_add_value' is broken.. It's been some time, but I *think* I
tested that at some point..

at-style time is definitely broken, though. Is anybody actually using
that? AfaIk parsing that at-style time is not thread-safe, though, so
I'd do it just list `rrd_update_r' and document it as not working. That
way all the `rrdc_*' functions stay thread safe from the beginning.

> I envision re-using get_time_from_reading().  However, I'd move the
> things that sanity-check against the RRD (starting with "if (version <
> 3)" into parse_ds (which is the only other caller of
> get_time_from_reading())..  Then, we can use this function in the
> daemon without prior knowledge of the RRD's structure.

That would be the best thing to do *if* we can get that at-style parsing
stuff to be thread safe.. I haven't looked at the code yet, though, and
have no clue if it's a library issue or just use of `strtok' or
something.. (I. e. I have no idea how hard it is to make that parsing be
thread safe.)

> In my envirionment, I have a process that calculates a very large
> number of RRD updates (1 per file), and then issues them in a loop
> with RRDs::update (perl).  With the current code, this will create a
> lot of unnecessary connect/disconnect.

> I noticed that some of the other methods that deal with several RRDs
> (i.e. graph, xport) re-use a single connection.  I am wondering if we
> should generalize this approach as follows:

I agree, having a ``global'' connection would be a good thing. I'd move
connecting to the daemon up, though. By that I mean that we introduce a
function such as
  rrdc_is_connected
which is used by all the API functions (update, graph, xport, ...) to
check whether a connection to the daemon exists or not. Parsing of the
`--daemon' argument and connecting to the daemon would then be done by
the `rrdtool' binary instead.

Other programs and scripts would need to call `connect' and `disconnect'
themselves. The environment variable should be interpreted by the
`rrdtool' binary only, but *not* by the library. If other programs want
to use the same library to behave in a similar way, they should parse it
themselves and call `connect' accordingly. To address your Perl related
remark above: I'd rather export `connect' and `disconnect' to Perl than
having magic happen.. There's enough magic in Perl already ;)

In my patch I've added the connection stuff to the (non-threadsafe) API
functions rather than the `rrdtool' command. I think this has been a bad
choice and I now think the implementation described above would be
superior by far.

So, in conclusion, yes, I think having one connection for all your
caching needs is desirable, but connection handling should be done
explicitly by the program using the library (which may happen to be the
`rrdtool' command)..

> Lots of updates are going to be queued at (0 mod
> config_flush_interval).  This risks creating the "thundering herd"
> problem that we're trying to avoid with delayed updates (albeit less
> frequently).  When creating new cache_item_t in handle_request_update,
> we should skew the time as follows:
> 
>   ci->last_flush_time = now + random() % config_flush_interval;

I see your point, but I think a much more effective way of avoiding IO
problems is by throttling the speed in which RRD files are written. If
you set this to, say, 20 updates per second, your system will stay
responsive and all data will and up on permanent storage eventually.
`Flush'ed values ignore this speed-limit, of course.. I've implemented
this for the `rrdtool' plugin in `collectd' and it works like a charm :)

I'd be willing to port or re-implement this feature and doing more
changes as outlines above, but I have better things to do than
implementing stuff that is then lying around for months. So I'll wait
until my work-so-far makes it into the repository before investing even
more time into this code..

Regards,
-octo
-- 
Florian octo Forster
Hacker in training
GnuPG: 0x91523C3D
http://verplant.org/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.oetiker.ch/pipermail/rrd-developers/attachments/20080902/4142ce6b/attachment.bin