[rrd-developers] [PATCH,RFC] optional mmap based file I/O

Bernhard Fischer rep.dot.nop at gmail.com
Thu May 31 00:46:49 CEST 2007


On Wed, May 30, 2007 at 09:37:04AM +0200, Bernhard Fischer wrote:
>On Wed, May 30, 2007 at 09:20:20AM +0200, Tobi Oetiker wrote:
>>Hi Bernhard,
>>
>>> Well, I was not sure about this. rrd_info does close the fd early on, so
>>> we cannot blindly close the fd in rrd_close. If it is ok for you, we can
>>> change rrd_close to close the fd (and adjust the callers accordingly).
>>
>>yes I think it this makes sense as rrd_cloes realy imples close ...
>>or do you see a risk in this ?
>
>No risk, Sounds fine for me. Will you do this, or should i add it to my
>TODO (i'd prefer the former, ATM)?

I took a stab at this as in r1092. Thanks to tobi for applying it!
>>
>>> >* why do you flag the first TWO pages ? rrd files with headers
>>> >  takeing up two pages should be pretty rare ...
>>>
>>> Fair enough. My small test rrd's all had about 4192 bytes of headers
>>> (IIRC), so i thought that using 2 pages should be a good starting point.
>>> Easy enough to change, if you are confident that 1 page is enough for
>>> the majority.
>>
>>hmmm well at the moment your strategy with madvise is not entirely
>>transparent to me ... I think we need the following:
>>
>>a) tell the OS that we do RANDOM access to prevent readahead while
>>   accessing the header portion
>
>ok.

That 'ok' may be a bit premature.. Actually, we are currently assuming
that we will need all headers (prefetched sequentially), and until now,
i think that this -- i.e. needing all headers -- holds true.
>>
>>b) set sequential access for stuff like rrd_fetch, rrd_resize, rrd_dump
>
>ok.

rrd_dump already has RRD_COPY, i.e. prefetch all pages of the rrd
involved, in sequential order. This is assumed to warm up the
soon-to-be-dumped rrd early on.

fetch is likely a bit different: As always, we will need the headers,
but (AFAICS) i cannot predict which rra's are going to be needed, so the
usual path (RANDOM, in my copy, which may still be DONTNEED in upstream
trunk due to lack of a useful testcase that grabs the Nth rra off M
pre-existing rrd's, which prooves to be faster for RANDOM than using
DONTNEED beforehand) is taken, i.e. we will not need any rra.
Not needing any data but the headers is my working, general theory of
operation, from rrd_open() POV. A caller certainly is able to hint at a
soon-to-be-required spot after opening an rrd, of course. This oviously
is the client's duty, i.e. has to be hinted in the respective user after
having opened an rrd (i tend to think).

resize is nothing i, personally would consider performance critical. So
until now, i didn't care much. Any example where trunk is slower in
resize than the 1.2 branch that i can reproduce?
>>
>>c) set DONTNEED (after reading/wrting) for all blocks except the
>>   header and the 'hot' RRA blocks.
>
>I think i have a note from you about the 'hot' RRA blocks. I'll dig it
>out and see if i understand what you mean.

Do i read this right as the 'next after lastupdate rra' ?
I admit that i still did not yet look at any possible details nor
implications thereof.. TODO.

[snip sysconf($SC_PAGESIZE) config check TODO.]

>>> >* if mmap is not used, then it would be cool if posix fadvise was
>>> >  still called (have not checked the code, just saw that you
>>> >  removed the check from configure)
>>>
>>> fadvise is still called for the FD path. Wrapping the check for fadvise
>>> in $enable_mmap just makes sure that we do not call fadvise for mmap,
>>> but only check for it and eventually use it for the non-mmap path.
>
>I'm just seeing that there is something wrong (performance-wise) with
>e.g.:
>rrdtool update v.rrd --template traffic_out:traffic_in 1180512000:2013364287020:558938738885
>
>I'm looking at this now. Sounds like it was not a great idea to use
>updatev for benchmarking and i should have concentrated more on update ;)

The fix for this embarrassing brown paperbag glitch of mine was sent to
and applied by tobi as r1094. Thanks and sorry for this oversight of
mine! ( Everybody should use updatev anyway since update is bloat
at any rate ;P )

whatever. I'll think a bit about the valuable other points raised by
others in this thread meanwhile. Thanks for those!



More information about the rrd-developers mailing list