[rrd-developers] rrdcached use corrupting RRD files (trunk)
s.shipway at auckland.ac.nz
Fri Oct 22 05:09:36 CEST 2010
Now that's a big setup.
I think the corruption is a result of the code not correctly handling the out-of-memory problems, and so if your version isn't experiencing a memory leak then you'll not be affected even if the bug is in the version you're using. The big problem is the memory leak, and I guess I'll need to learn to use valgrind to track it.
This is pointing the finger at it being the newer code (last and info, most likely, since create is rarely done) that causes any leaks, though there were a couple of additional changes between 2092 and 2136 that could be to blame.
I might try out the -a option; we've not used it yet as it's a new one in 1.4.trunk
ITS Unix Services Design Lead
University of Auckland, New Zealand
Floor 1, 58 Symonds Street, Auckland
Phone: +64 (0)9 3737599 ext 86487
DDI: +64 (0)9 924 6487
Mobile: +64 (0)21 753 189
Email: s.shipway at auckland.ac.nz<mailto:s.shipway at auckland.ac.nz>
P Please consider the environment before printing this e-mail
From: Thorsten von Eicken [mailto:tve at voneicken.com]
Sent: Friday, 22 October 2010 3:13 p.m.
To: Steve Shipway
Cc: kevin brintnall; rrd-developers at lists.oetiker.ch; rrd-users at lists.oetiker.ch
Subject: Re: [rrd-developers] rrdcached use corrupting RRD files (trunk)
As a separate data point, we're running over 100 rrdcached servers, each handling >30k tree nodes and receiving about 3k updates/sec, caching data for ~1 hour so updating files at ~20 updates/sec. Uptime in months without problem, never seen corruption (knock on wood). We're running 1.4 trunk revision r2092 (randomly picked) on Ubuntu 8.04 (used to run on CentOS 5.2, I believe). We're not seeing any memory leak and running stable at 800-900MB virtual / 500-600MB rss. We're using TCP sockets and doing updates, fetches and flushes. The command line we use is:
/usr/bin/rrdcached -w 3600 -z 3600 -f 7200 -t 2 -a 128 -b /rrds/hosts -B -j /rrds/journal -p /var/run/rrdcached/rrdcached.pid -l 10.x.x.x:xxxx
I'm not writing this to contradict you, I'm just wondering what could be different in your set-up that causes the problems. (Oh, that reminds me that the -a 128 made a huge difference for us around memory allocation performance.)
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the rrd-developers