[rrd-developers] [PATCH,RFC] optional mmap based file I/O

Wed May 30 19:07:25 CEST 2007

Hi Tobi,

On Wed, May 30, 2007 at 01:22:48PM +0200, Tobias Oetiker wrote:
> 
> as fahr as I have seen, linux (at least) will keep newly written
> blocks in cache anyway (since they are dirty) I have not seen
> freshly ready blocks to be evicted yet, I guess the frequency of
> use only plays a role as to which 'old' blocks to evict ... all
> in all, the result is that

Yes, a big part of the problem was that the OS (Linux, at least)
didn't distinguish between the RRD file blocks it unnecessarily read
because of default readahead and the real hot blocks, so when buffer
cache was scarce (most often b/c of readahead) it was just as likely
to evict hot blocks as the unnecessary onces.

> rrd_create (especailly) as well as the 'large read' commands will
> evict the 'interesting' blocks paged in through rrd_update during
> the last update cycle ...
> 
> I grant you that one may want to prevent DONTNT need to be set on
> fetch if one knows that subsequent graph or fetch commands will use
> the data ... so this should be available as a commandline option
> ...  --keep-cache

Cool.  With the default being conservative (i.e. the normal OS
buffer-cache policies but no readahead) that sounds like a good way
to proceed, and get an rrdtool version out there with the features
to gather the sum of experiences from testing on lots of systems.

> as for determining which rra block will be written next, this is
> pretty simple from the header information ... due to rrds
> write-forward nature, it is certain within a single rrd file which
> blocks will be accessed next ... for bulk updates this does have
> other implications I agree.

OK, it sounds like we're thinking similarly.  My overall point is not
to spend too much time on code to micromanage the buffer-cache when
the existing algorithms will do pretty good for us once we've reduce
the number of pages read on the periodic updates (when RRD is used in
near-realtime), so that there's some left over space in buffer-cache.

> A further optimization for bulk updates would be to always write a
> full block of data in one go, this would prevent the os from
> reading the current block back in ...

We don't really need to do anything to get this optimization since
it is essentially what happens already with asynchronous writes by
update/bdflush/pdflush, as long as you do a bunch of updates in a
small number of seconds on a page that is in cache, the writes get
coalesced (in each dirty page), and written out together.  This is
what we've observed with Dale Carder's RRDCache application level
cache strategy.  The CSV "journal" method that Kevin mentioned recently
sounds essentially the same: i.e just save up the update arguments
in a journal, then group them by RRD file and call update many times
in quick succession, periodically.

> > > ok convinced ... sunos3 (or whatever os changed the page size)
> > > compatibility is probably not such an issue ...
> >
> > I would expect to see an 8KB page size on 64-bit archs.  (For instance
> > Linux on alpha even has an 8KB block size for ext2.)
> 
> the problem just arises when page-size for a program can be
> different from compile time to execution time ... are there any
> oses that have this issue today ?

Oh, I misunderstood.  You're trying to decide whether to test page
size at run-time (or compile-time)?  I'd do run-time: sysconf()
or getpagesize() must be nominal operations.

Dave

-- 
plonka at doit.wisc.edu  http://net.doit.wisc.edu/~plonka/  Madison, WI