[rrd-developers] rfc: later caching

Mon Apr 20 08:11:23 CEST 2009

Hi Kevin,

Friday kevin brintnall wrote:

> On Fri, Apr 17, 2009 at 08:38:10AM +0200, Tobias Oetiker wrote:
> > disclaimer: this is about 1.5 not 1.4 !
> >
> > [...]
> >
> > * the (big) disadvantage is that updatev does not work anymore, and
> >   for larger deployments updatev is a cornerstone function in
> >   driving holt winters based alerting.
>
> Tobi, I was already planning updatev support along these lines, although I
> didn't have an idea when we'd integrate it (i.e. 1.5 vs 1.4.later).

:-) since my plan is to NOT have major new features once a release
is out ...

> > * it would require the cached to read the header information of the
> >   rrdfiles once and cache them internally so that it can calculate
> >   the updates without accessing the disk, but since header
> >   information is quite small, a decent sized machine could
> >   easily keep hundreds of thousands of headers in the cache daemon.
>
> I think the best first approach would be:
>
>  - on the first updatev, take an in-memory copy of the live_head and *_prep
>    - no need to do it for update
>    - CON: when the daemon starts up, there is heavy read load

yep, I guess that's the price ...

>  - a copy of the header allows us to do input validation on update strings
>    (i.e. correct ds_cnt)
>
>  - split up process_arg() into:
>    - parse update string, update in-memory pdp/cdp/*_prep
>    - write the RRAs
>
> I thought we'd do it in stages:
>
> (1) Process the update string twice.  Once when updatev received from the
>     client (update in-memory copy).  Once when we call rrd_update_r() with
>     the update string.
>
>     - minimal changes to the current rrdcached code/data structures
>     - CPU overhead minimal, still large delayed IO benefit of rrdcached
>     - easiest change that gives updatev support
>
> (2) Process the update string once.  Cache the computed results, and write
>     them out to the RRD after delay.
>
>     - requires we re-think rrd_update_r() or create a function to pass
>       pre-computed values into the RRD
>
> I'm guessing (1) will be significantly easier than (2), and provide almost
> the same functionality (if slightly less efficiently)..  I think staging
> it like this would make the most sense.

do you see a lot of 're-use' of stuff from (1) in (2) ? Or what is
the 'sense' in staging the process, given it is done in the safety
of trunk anyway.

cheers
tobi

-- 
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch tobi at oetiker.ch ++41 62 775 9902 / sb: -9900