[rrd-developers] rrdcached + collectd issues
Thorsten von Eicken
tve at voneicken.com
Sat Oct 10 01:41:55 CEST 2009
Kevin, thanks for your thoughts.
kevin brintnall wrote:
> Hi Thorsten,
>
> On Fri, Oct 09, 2009 at 11:46:26AM -0700, Thorsten von Eicken wrote:
>> I've been trying to get rrdcached and collectd to work together under
>> load and am running into a number of issues.
>> I'm using:
>> - collectd 4.7.4
>> - rrdtool trunk rev 1889 (had trouble compiling collectd with newer
>> versions)
>> - rrdcached stats snapshot:
>> 9 Statistics follow
>> QueueLength: 0
>> UpdatesReceived: 91123595
>> FlushesReceived: 83983
>> UpdatesWritten: 210810
>> DataSetsWritten: 82474667
>
> It looks like your RRD files must have a very large number of DS? Almost
> 400?
No, in fact most have 1 (that's how collectd likes it). You may be
looking at the recevied/written ratio which is skewed by journal replay
and the very long cache period I configure (1 hour).
>> TreeNodesNumber: 25925
>> TreeDepth: 17
>> JournalBytes: 6846161193
>> JournalRotate: 4
>> - approx 3k updates written to rrdcached per second
>> - approx 200-300KB written to journal per second
>> - approx 2k-3k data sets written per second
>> - rrdached params: -w 3600 -z 3600 -f 7200 -t 10
>> - disk I/O is not an issue
>> - rrdcached memory usage is not an issue (grows to 0.8GB then stays
>> totally flat), no swapping
>> - running collectd, rrdcached, and custom graphing app on same
>> dual-core server, verified that flushing for graphing is working properly
>>
>> First issue is that over time the data in the rrd files lag behind the
>> data arriving into collectd in the network. After 12 hours I see approx
>> a 5 minute lag. I've seen it go to >1.5 hrs after a bunch of days. The
>> symptoms are that data in the rrd files continues to advance at the
>> normal rate (20 second interval in our case) but just in the past. The
>> fact that the delay is steady leads me to believe that it's a program
>> bug (I've seen delays due to I/O overload in the past and different rrds
>> then show different lags and jump forward when they finally get some
>> disk attention). I've done some tests looking at the last_updated in the
>> rrd and looking at what the rrdcached daemon returns to a PENDING
>> command for the same file and I'm sure the flushing works. The daemon
>> just doesn't receive newer updates. The journal is also in sync with all
>> this. If I restart collectd, then the lag pretty quickly vanishes. So
>> either collectd has some queue with a bug, or data is queued in the
>> socket between collectd and rrdcached.
>
> Thorsten,
>
> This sounds like collectd not sending updates to rrdcached. If they are
> not in the journal, then rrdcached has not received them.
Yes, the question is whether it's collectd's fault or rrdcached's fault..
>> The second issue, which may possibly cause the first one is that the cpu
>> consumed by rrdcached is way too high. After running for about an hour
>> it consumes a full cpu (~90% user + ~10% system). It could be that
>> that's causing the above lag, dunno.
>
> If the updates are all being sent by librrd, then each file update
> requires one read() and one write() system call. The small read/write
> syscalls may increase CPU utilization. "BATCH" mode was introduced to
> deal with this.
Collectd doesn't use BATCH, but then, I'm using rrdcached to "batch"
updates into fewer writes. The caching is set to 1 hour, each rrd is
updated every 20 seconds. That seems to work fine.
> In my environment, I am bursting to 3.4k updates/sec (800 avg). rrdcached
> uses very little CPU during these bursts (<0.2%). I am writing directly
> to the daemon using "BATCH".
Yeah, as I mentioned above, I'm very steady at 3k updates received per sec.
>> I/O is not a problem as I mentioned, it's pure CPU. I've compiled
>> rrdcched with -pg to get gprof output, but haven't been successful. I
>> commented out install_signal_handlers (left the USR2 to be able to
>> terminate gracefully) and ran with -g, but the gprof output shows only
>> ~2 minutes of CPU time profiled when the daemon accumulated
>>> 250mins. Here's the top of the output:
>> Each sample counts as 0.01 seconds.
>> % cumulative self self total
>> time seconds seconds calls s/call s/call name
>> 55.12 62.39 62.39 280843249 0.00 0.00 buffer_get_field
>> 11.33 75.22 12.83 93607575 0.00 0.00 send_response
>> 10.32 86.91 11.68 93464852 0.00 0.00 handle_request_update
>> 5.36 92.97 6.06 connection_thread_main
>> 4.03 97.53 4.57 93683555 0.00 0.00 handle_request
>> 3.46 101.46 3.92 93484712 0.00 0.00 check_file_access
>> 3.29 105.18 3.72 176583057 0.00 0.00 next_cmd
>> 1.33 106.69 1.51 93686967 0.00 0.00 find_command
>> 1.23 108.08 1.40 88419974 0.00 0.00 journal_write
>> 1.00 109.22 1.14 93672403 0.00 0.00 has_privilege
>> It looks to like that's mostly the journal replay stuff and very little
>> more. If someone has tips on how to get real profiling output, I'm all ears.
>
> I've never had this sort of problem... What is your `uname -a` ?
Linux sketchy1.rightscale.com 2.6.21.7-2.fc8xen #1 SMP Fri Feb 15
12:34:28 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
>> The journal replay is too slow. When I terminate the daemon it leaves
>> several GB of journal files behind. Reading those in takes the better
>> part of an hour, during which the daemon is unresponsive.
>
> That seems awfully long. Mine is able to replay 83M entries (about 8GB)
> in 7 minutes.
One thing I noticed is that it starts out quite fast and gradually slows
down. I may be able to run a more controlled experiment. The more I
think about it, I have the feeling some rrdcached data structure is
getting slower and slower as it grows. How big is your process? As I
mentioned, mine is 0.8GB.
>> Most of time is in buffer_get_field. (Note: in the most common cases
>> buffer_get_field copies each field in-place, character by
>> character. Seems to me that a simple if statement could avoid the
>> writes.)
>
> Agreed that buffer_get_field implementation is not optimal. From what I
> can tell, it copies this way for three reasons:
>
> (1) desire to provide null terminated string to caller
> (2) do not want to modify original string (in case need to write it to journal)
> (3) allow escaped characters (presumably space)
Mhh, I don't think my C has gotten that rusty. Please look at the code.
Here are significant snippets:
buffer = *buffer_ret;
field = *buffer_ret;
field[field_size] = buffer[buffer_pos];
That's in-place modification as far as I can tell. In fact, in most
cases (no \ escape chars) it actually reads and writes the same byte in
the above assignment, so it doesn't actually copy anything.
>> By the way, I find the unix socket stuff undebuggable. I switched to TCP
>> sockets because I can telnet to the socket and find out what the daemon
>> is doing. (For example, when nothing seems to work for almost an hour
>> when I start the daemon because it's replaying logs there is no
>> information about what's going on anywhere.) I'm saying this because
>> everyone recommends the unix sockets for security reasons. It's unusable
>> IMHO.
>
> FYI netcat works just fine for talking on a unix socket: nc -U /wherever/sock
Ah, I must have missed that somehow. Will try, thanks!
Thorsten
More information about the rrd-developers
mailing list