[rrd-developers] rrdcached performance problem

Sat Oct 31 20:12:31 CET 2009

kevin brintnall wrote:
> On Sat, Oct 31, 2009 at 09:52:15AM -0700, Thorsten von Eicken wrote:
>   
>> Quick follow-up. I decided to add another 3k updates per second (extra 
>> 30k tree nodes) to my test run. See results in
>> http://www.voneicken.com/dl/rrd/rrdcached-7.png
>> What's interesting is that the server got somewhat overloaded sitting a 
>> lot in I/O wait. By and large the flush queue length remained under 
>> control, except when doing backups (10pm, 8:30am). Memory usage by 
>> rrdcached and collectd remained under control, but there is a long term 
>> upward-trending slope to rrdcached's memory usage which is not good. 
>> Possibly related to the power-of-two allocator patch that Florian 
>> provided. The graph I find the most interesting one is the disk sdk disk 
>> ops (3rd from the end). Before adding the last chunk of traffic the disk 
>> load was write-dominated, which means that rrds were mostly cached in 
>> memory (5-6 GB left after the processes). After adding the extra load 
>> the disk load became read-dominated indicating that the rrd working set 
>> exceeded memory.
>>     
>
> Thorsten,
>
> If you're becoming read dominated, you should consider lowering your file
> update/sec rate by increasing your -w/-f timers.  This just trades one
> kind of cache memory (f/s blocks) for another (update strings).
>   
Yeah, but given that I'm already using 1 hour of caching it starts 
getting a bit uncomfortable. I haven't re-tested the journal reading 
yet, but at some point restarting rrdcached becomes really difficult. 
But regardless, I suspect there are 3 performance regimes:
 - working set much smaller than memory - cpu scales linearly/smoothly 
with update rate
 - working set much larger than memory - it's all about random small 
disk I/O throughput
 - working set similar to memory - relatively sharp (?) transition from 
cpu/memory speed to disk speed
It's pretty clear that the performance when operating in memory is very 
nice and smooth, but when memory is exceeded things become bumpy. It 
degrades nicely and recovers well: I saw 9k queue items at one point and 
that got worked off over close to an hour but did work itself out as 
opposed to spinning out of control. But it doesn't look like a safe 
operating regime. So basically gotta keep the rrd working set in memory.

> I'm sending a linear chunk allocator along for allocating
> cache_item_t.values in operator-defined block sizes..  I'd appreciate if
> you'd test it with your load to see if it reduces your CPU usage related
> to frequent realloc().
>
>   
What value of -m do you suggest? -w divided by step size?

Thanks,
Thorsten