<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
Sadly interesting...<br>
As a separate data point, we're running over 100 rrdcached servers,
each handling >30k tree nodes and receiving about 3k updates/sec,
caching data for ~1 hour so updating files at ~20 updates/sec.
Uptime in months without problem, never seen corruption (knock on
wood). We're running 1.4 trunk revision r2092 (randomly picked) on
Ubuntu 8.04 (used to run on CentOS 5.2, I believe). We're not seeing
any memory leak and running stable at 800-900MB virtual / 500-600MB
rss. We're using TCP sockets and doing updates, fetches and flushes.
The command line we use is:<br>
/usr/bin/rrdcached -w 3600 -z 3600 -f 7200 -t 2 -a 128 -b
/rrds/hosts -B -j /rrds/journal -p /var/run/rrdcached/rrdcached.pid
-l 10.x.x.x:xxxx<br>
I'm not writing this to contradict you, I'm just wondering what
could be different in your set-up that causes the problems. (Oh,
that reminds me that the -a 128 made a huge difference for us around
memory allocation performance.)<br>
Good luck!<br>
TvE<br>
<br>
On 10/21/2010 6:50 PM, Steve Shipway wrote:
<blockquote
cite="mid:28E447343A85354483BCF7C3E9D5EAA5149A37AB@uxcn10-1.UoA.auckland.ac.nz"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=ISO-8859-1">
<meta name="Generator" content="Microsoft Word 12 (filtered
medium)">
<!--[if !mso]>
<style>
v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style>
<![endif]-->
<style>
<!--
/* Font Definitions */
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
        {font-family:Verdana;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
        {font-family:Webdings;
        panose-1:5 3 1 2 1 5 9 6 7 3;}
@font-face
        {font-family:"Arial Narrow";
        panose-1:2 11 6 6 2 2 2 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
span.EmailStyle17
        {mso-style-type:personal-reply;
        font-family:"Calibri","sans-serif";
        color:#1F497D;}
.MsoChpDefault
        {mso-style-type:export-only;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
        {page:WordSection1;}
-->
</style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: rgb(31,
73, 125);">The corrupted file ends up the correct size;
however the entire
file is filled with zeroes (fortunately, we archive our RRD
files nightly so I
can go back and retrieve the last uncorrupted version plus
the corrupted
version)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: rgb(31,
73, 125);"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: rgb(31,
73, 125);">The system is not (normally) memory or
process-constrained;
there is in fact nothing to speak of running apart from
apache and the
rrdcached daemon. The rrdinfo response is ‘not an RRD
file’,
since it doesn’t have the RRD header.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: rgb(31,
73, 125);"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: rgb(31,
73, 125);">It has run fine for a whole week at these rates
before the
problem hit; so that’s why I think it might be a leak in the
RRD
functions (which would of course not show up in a non-daemon
situation).
We use the remote update, info and (occasionally) create via
the TCP socket;
plus the info, last, flush and fetch via the UNIX socket.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: rgb(31,
73, 125);"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: rgb(31,
73, 125);">The build is the absolute latest r2136 .<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: rgb(31,
73, 125);"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: rgb(31,
73, 125);">The memory usage of the rrdcached process is
definitely
increasing; however that may also be due to the number of
items in the
queue? It is currently at 768m virtual, 560m physical (17%
usage) which
seems somewhat high to me, even for 20,000+ RRD files.
Eventually it will
hit address-space limits (this is a 32bit RHEL5 box with 4G
physical memory)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: rgb(31,
73, 125);"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: rgb(31,
73, 125);">Unfortunately I don’t have any of the nice
developer tools
for tracking memory leaks…<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: rgb(31,
73, 125);"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: rgb(31,
73, 125);">Steve<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: rgb(31,
73, 125);"><o:p> </o:p></span></p>
<div class="MsoNormal" style="text-align: center;"
align="center"><span style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: rgb(31,
73, 125);" lang="EN-US">
<hr width="100%" align="center" size="2">
</span></div>
<p class="MsoNormal"><b><span style="font-size: 11pt;
font-family: "Calibri","sans-serif";
color: rgb(31, 73, 125);">Steve Shipway<o:p></o:p></span></b></p>
<p class="MsoNormal"><span style="font-size: 10pt; font-family:
"Calibri","sans-serif"; color: rgb(31,
73, 125);">ITS Unix Services Design Lead<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size: 10pt; font-family:
"Calibri","sans-serif"; color: rgb(31,
73, 125);">University of Auckland, New Zealand<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size: 10pt; font-family:
"Calibri","sans-serif"; color: rgb(31,
73, 125);">Floor 1, 58 Symonds Street, Auckland<o:p></o:p></span></p>
<p class="MsoNormal"><i><span style="font-size: 10pt;
font-family: "Calibri","sans-serif";
color: rgb(89, 89, 89);">Phone: +64 (0)9 3737599 ext 86487<o:p></o:p></span></i></p>
<p class="MsoNormal"><i><span style="font-size: 10pt;
font-family: "Calibri","sans-serif";
color: rgb(89, 89, 89);">DDI: +64 (0)9 924 6487<o:p></o:p></span></i></p>
<p class="MsoNormal"><i><span style="font-size: 10pt;
font-family: "Calibri","sans-serif";
color: rgb(89, 89, 89);">Mobile: +64 (0)21 753 189<o:p></o:p></span></i></p>
<p class="MsoNormal"><i><span style="font-size: 10pt;
font-family: "Calibri","sans-serif";
color: rgb(89, 89, 89);">Email: <a moz-do-not-send="true"
href="mailto:s.shipway@auckland.ac.nz"><span
style="color: rgb(89, 89, 89);">s.shipway@auckland.ac.nz</span></a><o:p></o:p></span></i></p>
<p class="MsoNormal"><span style="font-size: 18pt; font-family:
Webdings; color: green;" lang="EN-GB">P</span><span
style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: blue;"
lang="EN-GB"> </span><span style="font-size: 10pt;
font-family: "Arial
Narrow","sans-serif"; color: green;"
lang="EN-GB">Please consider the environment before printing
this e-mail</span><span style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: blue;"
lang="EN-GB"> </span><span style="font-size: 7.5pt;
font-family: "Verdana","sans-serif";
color: navy;" lang="EN-GB"><o:p></o:p></span></p>
<p class="MsoNormal"><i><span style="font-size: 10pt;
font-family: "Calibri","sans-serif";
color: rgb(31, 73, 125);"><o:p> </o:p></span></i></p>
<p class="MsoNormal"><span style="font-size: 11pt; font-family:
"Calibri","sans-serif"; color: rgb(31,
73, 125);"><o:p> </o:p></span></p>
<div style="border-width: medium medium medium 1.5pt;
border-style: none none none solid; border-color:
-moz-use-text-color -moz-use-text-color -moz-use-text-color
blue; padding: 0cm 0cm 0cm 4pt;">
<div>
<div style="border-right: medium none; border-width: 1pt
medium medium; border-style: solid none none;
border-color: rgb(181, 196, 223) -moz-use-text-color
-moz-use-text-color; padding: 3pt 0cm 0cm;">
<p class="MsoNormal"><b><span style="font-size: 10pt;
font-family:
"Tahoma","sans-serif";"
lang="EN-US">From:</span></b><span style="font-size:
10pt; font-family:
"Tahoma","sans-serif";"
lang="EN-US"> kevin brintnall
[<a class="moz-txt-link-freetext" href="mailto:kbrint@rufus.net">mailto:kbrint@rufus.net</a>] <br>
<b>Sent:</b> Friday, 22 October 2010 1:40 p.m.<br>
<b>To:</b> Steve Shipway<br>
<b>Cc:</b> <a class="moz-txt-link-abbreviated" href="mailto:rrd-developers@lists.oetiker.ch">rrd-developers@lists.oetiker.ch</a>;
<a class="moz-txt-link-abbreviated" href="mailto:rrd-users@lists.oetiker.ch">rrd-users@lists.oetiker.ch</a><br>
<b>Subject:</b> Re: [rrd-developers] rrdcached use
corrupting RRD files (trunk)<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Sebastian,<o:p></o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">I don't think the problem is specific
to rrdcached; it uses
normal librrd API. This problem likely affects any RRD
access in a memory
constrained system.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Is there a lack of memory (or address
space if 32-bit) on
the system? Or is it running up against per-process
limits?<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">How does the file end up? Is it the
right size?
What errors do you get (i.e. when you "rrdtool info").
What architecture are you running on? mmap() under
failure
conditions is likely to be OS-specific.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">What revision of trunk?<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Let us know what you find re: memory
leak.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal" style="margin-bottom: 12pt;">-kb<o:p></o:p></p>
<div>
<p class="MsoNormal">On Thu, Oct 21, 2010 at 5:07 PM,
Steve Shipway <<a moz-do-not-send="true"
href="mailto:s.shipway@auckland.ac.nz">s.shipway@auckland.ac.nz</a>>
wrote:<o:p></o:p></p>
<div>
<div>
<p class="MsoNormal" style="">I’ve
had this happen too often now for it to be a fluke.
OK, so I’m
using the trunk version of rrdtool 1.4, but (as far
as I know) there is nothing
in there to modify the update code. We have a high
update frequency
– approx. 20,000 MRTG targets at 5min intervals,
which equates to about
70 updates per second, and it took about a week for
the problem to first hit.<o:p></o:p></p>
<p class="MsoNormal" style=""> <o:p></o:p></p>
<p class="MsoNormal" style="">It
seems that something is happening on update,
possibly involving memory
allocation failure, that results in a corrupted
file.<o:p></o:p></p>
<p class="MsoNormal" style=""> <o:p></o:p></p>
<p class="MsoNormal" style="">I
have some processes that may be reading the file
without using the rrdcached,
but all updates are certainly going this way (no
data collection is run on this
server any more, it all comes over TCP)<o:p></o:p></p>
<p class="MsoNormal" style=""> <o:p></o:p></p>
<p class="MsoNormal" style="">Selected
error logs show:<o:p></o:p></p>
<p class="MsoNormal" style="">listen_thread_main:
pthread_create failed.<o:p></o:p></p>
<p class="MsoNormal" style="">queue_thread_main:
rrd_update_r (/u01/rrdtool/maildelivery-mx1.rrd)
failed with status -1.
(mmaping file '/u01/rrdtool/maildelivery-mx1.rrd':
Cannot allocate memory)<o:p></o:p></p>
<p class="MsoNormal" style=""><i>
(restarted rrdcached here)</i><o:p></o:p></p>
<p class="MsoNormal" style="">replaying
from journal:
/u01/rrdtool/journal/rrd.journal.1285603416.766523<o:p></o:p></p>
<p class="MsoNormal" style="">Replayed
61011 entries (0 failures)<o:p></o:p></p>
<p class="MsoNormal" style="">replaying
from journal:
/u01/rrdtool/journal/rrd.journal.1285607016.766153<o:p></o:p></p>
<p class="MsoNormal" style="">Malformed
journal entry at line 31024<o:p></o:p></p>
<p class="MsoNormal" style="">Replayed
31023 entries (1 failures)<o:p></o:p></p>
<p class="MsoNormal" style="">journal
processing complete<o:p></o:p></p>
<p class="MsoNormal" style="">queue_thread_main:
rrd_update_r (/u01/rrdtool/maildelivery-mx1.rrd)
failed with status -1.
('/u01/rrdtool/maildelivery-mx1.rrd' is not an RRD
file)<o:p></o:p></p>
<p class="MsoNormal" style=""> <o:p></o:p></p>
<p class="MsoNormal" style="">Although
there was only one journal failure, there were in
fact several RRD files
corrupted (I suspect the ones which were open at the
time of the memory
failure?) and even more with the rrd_update_r memory
allocation failure.<o:p></o:p></p>
<p class="MsoNormal" style=""> <o:p></o:p></p>
<p class="MsoNormal" style="">It
seems that the memory ran out (memory leak?) and
somewhere in the rrd_update_r
something was half-done. The resultant corrupted
RRD file doesn’t
even load in rrdtool, seems the header is corrupt –
I don’t (yet)
understand enough of the mmap code to work out what
could be causing
this. I’m also trying to track the memory usage of
the rrdcached
process to see if it is indeed growing due to a
leak.<o:p></o:p></p>
<p class="MsoNormal" style=""> <o:p></o:p></p>
<p class="MsoNormal" style="">I
think there are two bugs here – first, the memory
leak causing the
failure, and second, something in the code is not
correctly handling a memory
allocation failure and corrupts the RRD file as a
result.<o:p></o:p></p>
<p class="MsoNormal" style=""> <o:p></o:p></p>
<p class="MsoNormal" style="">Has
anyone else experienced this? And, more to the
point, any RRD developers
who understand the MMAP update code want to take a
look or give some pointers?<o:p></o:p></p>
<p class="MsoNormal" style=""> <o:p></o:p></p>
<p class="MsoNormal" style="">Steve<o:p></o:p></p>
<p class="MsoNormal" style=""> <o:p></o:p></p>
<div class="MsoNormal" style="text-align: center;"
align="center"><span lang="EN-US">
<hr width="100%" align="center" size="2">
</span></div>
<p class="MsoNormal" style=""><b>Steve
Shipway</b><o:p></o:p></p>
<p class="MsoNormal" style=""><span style="font-size:
10pt;">ITS Unix Services Design Lead</span><o:p></o:p></p>
<p class="MsoNormal" style=""><span style="font-size:
10pt;">University of Auckland, New Zealand</span><o:p></o:p></p>
<p class="MsoNormal" style=""><span style="font-size:
10pt;">Floor 1, 58 Symonds Street, Auckland</span><o:p></o:p></p>
<p class="MsoNormal" style=""><i><span
style="font-size: 10pt; color: rgb(89, 89, 89);">Phone:
+64 (0)9 3737599 ext 86487</span></i><o:p></o:p></p>
<p class="MsoNormal" style=""><i><span
style="font-size: 10pt; color: rgb(89, 89, 89);">DDI:
+64 (0)9 924 6487</span></i><o:p></o:p></p>
<p class="MsoNormal" style=""><i><span
style="font-size: 10pt; color: rgb(89, 89, 89);">Mobile:
+64 (0)21 753 189</span></i><o:p></o:p></p>
<p class="MsoNormal" style=""><i><span
style="font-size: 10pt; color: rgb(89, 89, 89);">Email:
<a moz-do-not-send="true"
href="mailto:s.shipway@auckland.ac.nz"
target="_blank"><span style="color: rgb(89,
89, 89);">s.shipway@auckland.ac.nz</span></a></span></i><o:p></o:p></p>
<p class="MsoNormal" style=""><span style="font-size:
18pt; font-family: Webdings; color: green;"
lang="EN-GB">P</span><span style="color: blue;"
lang="EN-GB"> </span><span style="font-size:
10pt; color: green;" lang="EN-GB">Please consider
the environment before printing this e-mail</span><span
style="color: blue;" lang="EN-GB"> </span><o:p></o:p></p>
<p class="MsoNormal" style=""><i><span
style="font-size: 10pt;"> </span></i><o:p></o:p></p>
<p class="MsoNormal" style=""> <o:p></o:p></p>
</div>
</div>
<p class="MsoNormal" style="margin-bottom: 12pt;"><br>
_______________________________________________<br>
rrd-developers mailing list<br>
<a moz-do-not-send="true"
href="mailto:rrd-developers@lists.oetiker.ch">rrd-developers@lists.oetiker.ch</a><br>
<a moz-do-not-send="true"
href="https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers"
target="_blank">https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers</a><o:p></o:p></p>
</div>
<p class="MsoNormal" style="margin-bottom: 12pt;"><br>
<br clear="all">
<br>
-- <br>
kevin brintnall =~ /<a moz-do-not-send="true"
href="http://kbrint@rufus.net/">kbrint@rufus.net/</a><o:p></o:p></p>
</div>
</div>
</div>
<pre wrap="">
<fieldset class="mimeAttachmentHeader"></fieldset>
_______________________________________________
rrd-developers mailing list
<a class="moz-txt-link-abbreviated" href="mailto:rrd-developers@lists.oetiker.ch">rrd-developers@lists.oetiker.ch</a>
<a class="moz-txt-link-freetext" href="https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers">https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers</a>
</pre>
</blockquote>
</body>
</html>