[rrd-developers] rrdcached contention when flushing

Tue Nov 4 18:12:09 CET 2008

On Tue, Nov 04, 2008 at 04:57:27PM -0000, Daniel.Pocock at barclayscapital.com wrote:

>>> They all become un-stuck at the same time, maybe 20 seconds later, and
>>> then the graphs appear very quickly.

One idea is to try tracing the execution of the rrdcached...  see if it's
blocking on any I/O syscalls when the stalls happen.

> I've experimented with sysctl, here are values I'm currently using:
> 
> vm.dirty_expire_centisecs = 179971
> vm.dirty_writeback_centisecs = 35993
> vm.dirty_ratio = 90
> vm.dirty_background_ratio = 2
> vm.max_map_count = 4000000
>
> If I understand correctly, then vm.dirty_ratio means nothing should
> block until 90% of the RAM is taken up by dirty pages.  Given that
> mmap() is being used with MAP_SHARED, and I have 8GB of RAM, all the
> necessary pages should be staying in RAM.  If you can suggest a more
> appropriate strategy for configuring the cache, it would be very
> welcome.

I don't think I can compete with the many tuning resources already out
there.  I'm primarily a FreeBSD guy.  Here's what I'm using on my one
Linux box.

	for disk in sda sdb ; do
	    ## give the scheduler something to work with
	    echo 512 > /sys/block/$disk/queue/nr_requests 

	    ## set read-ahead to 2 file system blocks
	    blockdev --setra 16 /dev/$disk
	done

	echo 90000 > /proc/sys/vm/dirty_writeback_centisecs
	echo 35 > /proc/sys/vm/dirty_background_ratio
	echo 85 > /proc/sys/vm/dirty_ratio

This allows lots of write burst out to RAM... but when it blocks (in
writeback), all IO blocks.  It's not optimal.

> There is also a memory leak somewhere (maybe in my striping code, maybe
> in rrdcached).  I've tried to start rrdcached with valgrind, but my
> large mmap() call fails with EINVAL when using valgrind.

I've been running rrdcached for weeks with no leaks..  About 30k files in
my test environment, step=300.

> The memory leak could be the cause of the performance issue - it grows
> to several gigabytes and there is swapping, that might be reducing the
> amount of RAM available for caching the mmap() pages.  Can you make any
> suggestions for using valgrind or another tool in this scenario?

The leak may be causing other problems..  I'd try running a separate
instance with smaller files and see if you can get valgrind to cooperate.
I haven't found a better tool for tracking down leaks.

-- 
 kevin brintnall =~ /kbrint at rufus.net/