[rrd-developers] rrdcached + collectd issues

Fri Oct 9 22:44:09 CEST 2009

Hi Thorsten,

On Fri, Oct 09, 2009 at 11:46:26AM -0700, Thorsten von Eicken wrote:
> I've been trying to get rrdcached and collectd to work together under 
> load and am running into a number of issues.
> I'm using:
>  - collectd 4.7.4
>  - rrdtool trunk rev 1889 (had trouble compiling collectd with newer 
> versions)
>  - rrdcached stats snapshot:
> 9 Statistics follow
> QueueLength: 0
> UpdatesReceived: 91123595
> FlushesReceived: 83983
> UpdatesWritten: 210810
> DataSetsWritten: 82474667

It looks like your RRD files must have a very large number of DS?  Almost
400?

> TreeNodesNumber: 25925
> TreeDepth: 17
> JournalBytes: 6846161193
> JournalRotate: 4
>  - approx 3k updates written to rrdcached per second
>  - approx 200-300KB written to journal per second
>  - approx 2k-3k data sets written per second
>  - rrdached params: -w 3600 -z 3600 -f 7200 -t 10
>  - disk I/O is not an issue
>  - rrdcached memory usage is not an issue (grows to 0.8GB then stays 
> totally flat), no swapping
>  - running collectd, rrdcached, and custom graphing app on same 
> dual-core server, verified that flushing for graphing is working properly
> 
> First issue is that over time the data in the rrd files lag behind the
> data arriving into collectd in the network. After 12 hours I see approx
> a 5 minute lag. I've seen it go to >1.5 hrs after a bunch of days. The
> symptoms are that data in the rrd files continues to advance at the
> normal rate (20 second interval in our case) but just in the past. The
> fact that the delay is steady leads me to believe that it's a program
> bug (I've seen delays due to I/O overload in the past and different rrds
> then show different lags and jump forward when they finally get some
> disk attention). I've done some tests looking at the last_updated in the
> rrd and looking at what the rrdcached daemon returns to a PENDING
> command for the same file and I'm sure the flushing works. The daemon
> just doesn't receive newer updates. The journal is also in sync with all
> this.  If I restart collectd, then the lag pretty quickly vanishes. So
> either collectd has some queue with a bug, or data is queued in the
> socket between collectd and rrdcached.

Thorsten,

This sounds like collectd not sending updates to rrdcached.  If they are
not in the journal, then rrdcached has not received them.

> The second issue, which may possibly cause the first one is that the cpu 
> consumed by rrdcached is way too high. After running for about an hour 
> it consumes a full cpu (~90% user + ~10% system). It could be that 
> that's causing the above lag, dunno.

If the updates are all being sent by librrd, then each file update
requires one read() and one write() system call.  The small read/write
syscalls may increase CPU utilization.  "BATCH" mode was introduced to
deal with this.

There is no way to use BATCH from within the librrd; it is only available
to users willing to speak the wire protocol directly.

In my environment, I am bursting to 3.4k updates/sec (800 avg).  rrdcached
uses very little CPU during these bursts (<0.2%).  I am writing directly
to the daemon using "BATCH".

> I/O is not a problem as I mentioned, it's pure CPU. I've compiled
> rrdcched with -pg to get gprof output, but haven't been successful. I
> commented out install_signal_handlers (left the USR2 to be able to
> terminate gracefully) and ran with -g, but the gprof output shows only
> ~2 minutes of CPU time profiled when the daemon accumulated
> >250mins. Here's the top of the output:
>
> Each sample counts as 0.01 seconds.
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls   s/call   s/call  name
>  55.12     62.39    62.39 280843249     0.00     0.00  buffer_get_field
>  11.33     75.22    12.83 93607575     0.00     0.00  send_response
>  10.32     86.91    11.68 93464852     0.00     0.00  handle_request_update
>   5.36     92.97     6.06                             connection_thread_main
>   4.03     97.53     4.57 93683555     0.00     0.00  handle_request
>   3.46    101.46     3.92 93484712     0.00     0.00  check_file_access
>   3.29    105.18     3.72 176583057     0.00     0.00  next_cmd
>   1.33    106.69     1.51 93686967     0.00     0.00  find_command
>   1.23    108.08     1.40 88419974     0.00     0.00  journal_write
>   1.00    109.22     1.14 93672403     0.00     0.00  has_privilege
> It looks to like that's mostly the journal replay stuff and very little 
> more. If someone has tips on how to get real profiling output, I'm all ears.

I've never had this sort of problem...  What is your `uname -a` ?

> The journal replay is too slow. When I terminate the daemon it leaves 
> several GB of journal files behind. Reading those in takes the better 
> part of an hour, during which the daemon is unresponsive.

That seems awfully long.  Mine is able to replay 83M entries (about 8GB)
in 7 minutes.

> Most of time is in buffer_get_field. (Note: in the most common cases
> buffer_get_field copies each field in-place, character by
> character. Seems to me that a simple if statement could avoid the
> writes.)

Agreed that buffer_get_field implementation is not optimal.  From what I
can tell, it copies this way for three reasons:

(1) desire to provide null terminated string to caller
(2) do not want to modify original string (in case need to write it to journal)
(3) allow escaped characters (presumably space)

(1) with (2) implies that we have to copy, rather than use the source
string.

(3) implies that we have to handle \ specially, but not necessarily that
we have to copy character by character.  I would expect '\' to appear
infrequently.  The only case I can imagine is if the RRD file path
contains spaces.

It's possible to implement it with strncpy, but performance depends on
whether the optimizations in strncpy offset the need to pre-scan the
string for delimiters.

If we were OK with modifying the string in place, we could do it without
copying...  Except, if we want to then write the original string to
journal we have to keep a copy.  More journal entries are written than
read, so I would avoid this.

> By the way, I find the unix socket stuff undebuggable. I switched to TCP 
> sockets because I can telnet to the socket and find out what the daemon 
> is doing. (For example, when nothing seems to work for almost an hour 
> when I start the daemon because it's replaying logs there is no 
> information about what's going on anywhere.) I'm saying this because 
> everyone recommends the unix sockets for security reasons. It's unusable 
> IMHO.

It is possible to use Unix sockets and TCP sockets simultaneously.

FYI netcat works just fine for talking on a unix socket:  nc -U /wherever/sock

YMMV, but I am able to get 2x as many updates when using a Unix socket on
FreeBSD 7.x.

-- 
 kevin brintnall =~ /kbrint at rufus.net/