[rrd-developers] [PATCH,RFC] optional mmap based file I/O

Wed May 30 11:49:40 CEST 2007

Hi Tobi,

On Wed, May 30, 2007 at 09:20:20AM +0200, Tobi Oetiker wrote:
> > >* why do you flag the first TWO pages ? rrd files with headers
> > >  takeing up two pages should be pretty rare ...
> >
> > Fair enough. My small test rrd's all had about 4192 bytes of headers
> > (IIRC), so i thought that using 2 pages should be a good starting point.
> > Easy enough to change, if you are confident that 1 page is enough for
> > the majority.
> 
> hmmm well at the moment your strategy with madvise is not entirely
> transparent to me ... I think we need the following:
> 
> a) tell the OS that we do RANDOM access to prevent readahead while
>    accessing the header portion
> 
> b) set sequential access for stuff like rrd_fetch, rrd_resize, rrd_dump
>
> c) set DONTNEED (after reading/wrting) for all blocks except the
>    header and the 'hot' RRA blocks.

Given (a), I'm wondering if both (b) and (c) are being suggested
prematurely.  Ideally, we would carefully measure what's going on here,
and the effectiveness of (b) and (c) will vary by platform due to
initial readahead sizes, adaptive readahead algorithms, and DONTNEED
implementation differences.  (Rhetorical: Are we assuming Linux?)

Regarding (b) SEQUENTIAL: using [fm]advise to select seqential access
within just within the right regions seems OK (by file offset and
length), but it will still sometimes do the wrong thing if it results
in readahead across an RRA boundaries.  I suggest sticking with just
RANDOM, and let the pages be read into buffer-cache as their accessed,
one at a time.  The adaptive readahead algorithm in Linux is pretty
aggressive, and it will probably over-read otherwise.  Sometimes that
aggressive readahead is OK, but not for a file-based database such
as RRD that is made of very many files with such a short open/close
time (that doesn't permit the adaptive readahead algorithm to adapt,
or rather causes it to forget at file close time).

Regarding (c) DONTNEED: the normal page replacement strategy to manage
the buffer-cache should work well on its own so that it will replace
pages either least-frequently or least-recently used.  (E.g. Linux has
a 2Q like management strategy that tracks both LRU and frequency of
reference even across page evictions...  it does the right thing much
of the time.  The problem was that it didn't differentiate between RRD
file pages that were readahead and those that were actually needed.)

If we call DONTNEED for all the pages/blocks that a graph call
wanted (to do thresholding or whatever) it can cause the pages to
be evicted immediately from buffer-cache and there are situations
(such as when other graph commands are accessing the same area) when
this is the Wrong Thing(tm).

Whether or not one considers this a problem with the advisory APIs
or their implementation, I find DONTNEED to be too aggressive for
the purpose you describe in (c) because it causes immediate action
rather than just using it as an advice or hint to the page replacement
algorithm.

Of course, it is sometimes convenient that DONTNEED takes immediate
action in Linux 2.6 such as when one wants to forcibly evict a files
pages from buffer-cache (like you do with rsync or I do with the
fadvise command http://net.doit.wisc.edu/~plonka/fadvise/ ), but it
presumes the caller is *omniscient*; however, one process calling RRD
fetch/graph/etc. just doesn't really know if another process wants
those pages.

Lastly, maybe I misunderstand, but for DONTNEED, are you suggesting
that a fetch or graph command should treat some pages as hotter (i.e.
those that would be updated in near-realtime) and work-around those
so they don't accidentally get evicted?  That seems complicated to
implement, and again implies omniscience on the part of one process
as to what other processes are doing (i.e. it assumes updates in
near-realtime, but updates can be deferred, and even are in some of
the application cache implementations out there).

> > >* for figuring the pagesize you should use getpagesize is
> > >  available.
> >
> > So i vote for using sysconf, and adding checks for
> > _SC_PAGESIZE, _SC_PAGE_SIZE, PAGESIZE, PAGE_SIZE
> >
> > Ok?
> 
> ok convinced ... sunos3 (or whatever os changed the page size)
> compatibility is probably not such an issue ...

I would expect to see an 8KB page size on 64-bit archs.  (For instance
Linux on alpha even has an 8KB block size for ext2.)

Dave

-- 
plonka at doit.wisc.edu  http://net.doit.wisc.edu/~plonka/  Madison, WI