[rrd-developers] [rrd] Re: rrdcached dies when journal reaches 2G

Fri Jul 17 22:56:02 CEST 2009

On Fri, Jul 17, 2009 at 10:34:43PM +0200, Tobias Oetiker wrote:
> > I've started to implement both options..  The large file support is dead
> > simple, whereas the split journal support is much more complicated.  There
> > are three combinations:
> >
> >  (1) go with large file support
> >      (1a) OS supports it
> >      (1b) OS doesn't support it
> >
> >  (2) split journal
> >
> > In cases (1b) and (2), we need to watch the journal size as we write to
> > it, and roll-over to a new file once the current file gets too large.  For
> > (1b), a full tree flush is required.
> >
> > The forced flush requirement for (1b) doesn't seem too onerous when we
> > consider that a journal limited to 2^31 bytes will nearly always be on a
> > system whose per-process memory space is limited to 2^32 bytes (or
> > smaller).  Under these conditions, there are only a couple cases where
> > rrdcached will not run out of memory anyway.  If (small -w), then the
> > forced flush won't have many values to write.  If (small -f), then the
> > flush would only be twice as often as normal in the worst case.
> >
> > I'm looking for some old OS's without large file support to test with, but
> > I'm having a hard time finding one.  Perhaps that's a good sign.  Do you
> > have any demographic info on the RRD install base?
> 
> I only have access stats of the web site this does not realy help
> regarding oses ... there is a number of people who run rrdtool on
> embeded systems probably with old linux kernels ... also netware is
> a target where I am not sure about the large file support, but then
> again these systems mostly are running on old versions or rrdtool
> since the library dependencies of newer versions are much more
> complex anyway ...
> 
> > I'm leaning strongly towards (1) after seeing the implications on the
> > code...
> 
> I think a forced flush is a good solution here ...
> 
> I have not looked at the code, but do you only reset the journal on
> flush or when there are no data left in the cache ? This might
> cause a long running copy of cached to generate rather huge an
> unwieldy journals ...

Currently, the journal is rotated when the flush is started.  This way,
any updates which are not flushed to disk are guaranteed to be in one of
the two journals.  In practice, there are not many un-flushed updates in
the old journal.  The forced flush is required to guarantee that any
un-written records are entirely within the two files.

> da directory tree with bits of journal that can be unlinked as their
> content has gone to disk would seem to be a more sustainable solution
> ...

My current implementation for (2) is a set of files rrd.journal.$TIME.
The daemon manages the files in a group, rotating the entire set (cur ->
old -> unlink).  At journal replay time, the files are enumerated, sorted,
and replayed in order.  In this case, a forced flush is not required.  It
will take longer to finish and test this implementation.

-- 
 kevin brintnall =~ /kbrint at rufus.net/