[rrd-developers] rrdcached + collectd issues

Sat Oct 10 01:41:55 CEST 2009

Kevin, thanks for your thoughts.

kevin brintnall wrote:
> Hi Thorsten,
> 
> On Fri, Oct 09, 2009 at 11:46:26AM -0700, Thorsten von Eicken wrote:
>> I've been trying to get rrdcached and collectd to work together under 
>> load and am running into a number of issues.
>> I'm using:
>>  - collectd 4.7.4
>>  - rrdtool trunk rev 1889 (had trouble compiling collectd with newer 
>> versions)
>>  - rrdcached stats snapshot:
>> 9 Statistics follow
>> QueueLength: 0
>> UpdatesReceived: 91123595
>> FlushesReceived: 83983
>> UpdatesWritten: 210810
>> DataSetsWritten: 82474667
> 
> It looks like your RRD files must have a very large number of DS?  Almost
> 400?

No, in fact most have 1 (that's how collectd likes it). You  may be 
looking at the recevied/written ratio which is skewed by journal replay 
and the very long cache period I configure (1 hour).

>> TreeNodesNumber: 25925
>> TreeDepth: 17
>> JournalBytes: 6846161193
>> JournalRotate: 4
>>  - approx 3k updates written to rrdcached per second
>>  - approx 200-300KB written to journal per second
>>  - approx 2k-3k data sets written per second
>>  - rrdached params: -w 3600 -z 3600 -f 7200 -t 10
>>  - disk I/O is not an issue
>>  - rrdcached memory usage is not an issue (grows to 0.8GB then stays 
>> totally flat), no swapping
>>  - running collectd, rrdcached, and custom graphing app on same 
>> dual-core server, verified that flushing for graphing is working properly
>>
>> First issue is that over time the data in the rrd files lag behind the
>> data arriving into collectd in the network. After 12 hours I see approx
>> a 5 minute lag. I've seen it go to >1.5 hrs after a bunch of days. The
>> symptoms are that data in the rrd files continues to advance at the
>> normal rate (20 second interval in our case) but just in the past. The
>> fact that the delay is steady leads me to believe that it's a program
>> bug (I've seen delays due to I/O overload in the past and different rrds
>> then show different lags and jump forward when they finally get some
>> disk attention). I've done some tests looking at the last_updated in the
>> rrd and looking at what the rrdcached daemon returns to a PENDING
>> command for the same file and I'm sure the flushing works. The daemon
>> just doesn't receive newer updates. The journal is also in sync with all
>> this.  If I restart collectd, then the lag pretty quickly vanishes. So
>> either collectd has some queue with a bug, or data is queued in the
>> socket between collectd and rrdcached.
> 
> Thorsten,
> 
> This sounds like collectd not sending updates to rrdcached.  If they are
> not in the journal, then rrdcached has not received them.

Yes, the question is whether it's collectd's fault or rrdcached's fault..

>> The second issue, which may possibly cause the first one is that the cpu 
>> consumed by rrdcached is way too high. After running for about an hour 
>> it consumes a full cpu (~90% user + ~10% system). It could be that 
>> that's causing the above lag, dunno.
> 
> If the updates are all being sent by librrd, then each file update
> requires one read() and one write() system call.  The small read/write
> syscalls may increase CPU utilization.  "BATCH" mode was introduced to
> deal with this.

Collectd doesn't use BATCH, but then, I'm using rrdcached to "batch" 
updates into fewer writes. The caching is set to 1 hour, each rrd is 
updated every 20 seconds. That seems to work fine.

> In my environment, I am bursting to 3.4k updates/sec (800 avg).  rrdcached
> uses very little CPU during these bursts (<0.2%).  I am writing directly
> to the daemon using "BATCH".

Yeah, as I mentioned above, I'm very steady at 3k updates received per sec.

>> I/O is not a problem as I mentioned, it's pure CPU. I've compiled
>> rrdcched with -pg to get gprof output, but haven't been successful. I
>> commented out install_signal_handlers (left the USR2 to be able to
>> terminate gracefully) and ran with -g, but the gprof output shows only
>> ~2 minutes of CPU time profiled when the daemon accumulated
>>> 250mins. Here's the top of the output:
>> Each sample counts as 0.01 seconds.
>>   %   cumulative   self              self     total
>>  time   seconds   seconds    calls   s/call   s/call  name
>>  55.12     62.39    62.39 280843249     0.00     0.00  buffer_get_field
>>  11.33     75.22    12.83 93607575     0.00     0.00  send_response
>>  10.32     86.91    11.68 93464852     0.00     0.00  handle_request_update
>>   5.36     92.97     6.06                             connection_thread_main
>>   4.03     97.53     4.57 93683555     0.00     0.00  handle_request
>>   3.46    101.46     3.92 93484712     0.00     0.00  check_file_access
>>   3.29    105.18     3.72 176583057     0.00     0.00  next_cmd
>>   1.33    106.69     1.51 93686967     0.00     0.00  find_command
>>   1.23    108.08     1.40 88419974     0.00     0.00  journal_write
>>   1.00    109.22     1.14 93672403     0.00     0.00  has_privilege
>> It looks to like that's mostly the journal replay stuff and very little 
>> more. If someone has tips on how to get real profiling output, I'm all ears.
> 
> I've never had this sort of problem...  What is your `uname -a` ?

Linux sketchy1.rightscale.com 2.6.21.7-2.fc8xen #1 SMP Fri Feb 15 
12:34:28 EST 2008 x86_64 x86_64 x86_64 GNU/Linux

>> The journal replay is too slow. When I terminate the daemon it leaves 
>> several GB of journal files behind. Reading those in takes the better 
>> part of an hour, during which the daemon is unresponsive.
> 
> That seems awfully long.  Mine is able to replay 83M entries (about 8GB)
> in 7 minutes.

One thing I noticed is that it starts out quite fast and gradually slows 
down. I may be able to run a more controlled experiment. The more I 
think about it, I have the feeling some rrdcached data structure is 
getting slower and slower as it grows. How big is your process? As I 
mentioned, mine is 0.8GB.

>> Most of time is in buffer_get_field. (Note: in the most common cases
>> buffer_get_field copies each field in-place, character by
>> character. Seems to me that a simple if statement could avoid the
>> writes.)
> 
> Agreed that buffer_get_field implementation is not optimal.  From what I
> can tell, it copies this way for three reasons:
> 
> (1) desire to provide null terminated string to caller
> (2) do not want to modify original string (in case need to write it to journal)
> (3) allow escaped characters (presumably space)

Mhh, I don't think my C has gotten that rusty. Please look at the code. 
Here are significant snippets:
   buffer = *buffer_ret;
   field = *buffer_ret;
   field[field_size] = buffer[buffer_pos];
That's in-place modification as far as I can tell. In fact, in most 
cases (no \ escape chars) it actually reads and writes the same byte in 
the above assignment, so it doesn't actually copy anything.

>> By the way, I find the unix socket stuff undebuggable. I switched to TCP 
>> sockets because I can telnet to the socket and find out what the daemon 
>> is doing. (For example, when nothing seems to work for almost an hour 
>> when I start the daemon because it's replaying logs there is no 
>> information about what's going on anywhere.) I'm saying this because 
>> everyone recommends the unix sockets for security reasons. It's unusable 
>> IMHO.
> 
> FYI netcat works just fine for talking on a unix socket:  nc -U /wherever/sock

Ah, I must have missed that somehow. Will try, thanks!

Thorsten