[rrd-developers] rrdcached use corrupting RRD files (trunk)

kevin brintnall kbrint at rufus.net
Fri Oct 22 02:39:49 CEST 2010


Sebastian,

I don't think the problem is specific to rrdcached; it uses normal librrd
API.  This problem likely affects any RRD access in a memory constrained
system.

Is there a lack of memory (or address space if 32-bit) on the system?  Or is
it running up against per-process limits?

How does the file end up?  Is it the right size?  What errors do you get
(i.e. when you "rrdtool info").  What architecture are you running on?
 mmap() under failure conditions is likely to be OS-specific.

What revision of trunk?

Let us know what you find re: memory leak.

-kb

On Thu, Oct 21, 2010 at 5:07 PM, Steve Shipway <s.shipway at auckland.ac.nz>wrote:

>  I’ve had this happen too often now for it to be a fluke.  OK, so I’m
> using the trunk version of rrdtool 1.4, but (as far as I know) there is
> nothing in there to modify the update code.  We have a high update frequency
> – approx. 20,000 MRTG targets at 5min intervals, which equates to about 70
> updates per second, and it took about a week for the problem to first hit.
>
>
>
> It seems that something is happening on update, possibly involving memory
> allocation failure, that results in a corrupted file.
>
>
>
> I have some processes that may be reading the file without using the
> rrdcached, but all updates are certainly going this way (no data collection
> is run on this server any more, it all comes over TCP)
>
>
>
> Selected error logs show:
>
> listen_thread_main: pthread_create failed.
>
> queue_thread_main: rrd_update_r (/u01/rrdtool/maildelivery-mx1.rrd) failed
> with status -1. (mmaping file '/u01/rrdtool/maildelivery-mx1.rrd': Cannot
> allocate memory)
>
> *   (restarted rrdcached here)*
>
> replaying from journal: /u01/rrdtool/journal/rrd.journal.1285603416.766523
>
> Replayed 61011 entries (0 failures)
>
> replaying from journal: /u01/rrdtool/journal/rrd.journal.1285607016.766153
>
> Malformed journal entry at line 31024
>
> Replayed 31023 entries (1 failures)
>
> journal processing complete
>
> queue_thread_main: rrd_update_r (/u01/rrdtool/maildelivery-mx1.rrd) failed
> with status -1. ('/u01/rrdtool/maildelivery-mx1.rrd' is not an RRD file)
>
>
>
> Although there was only one journal failure, there were in fact several RRD
> files corrupted (I suspect the ones which were open at the time of the
> memory failure?) and even more with the rrd_update_r memory allocation
> failure.
>
>
>
> It seems that the memory ran out (memory leak?) and somewhere in the
> rrd_update_r something was half-done.  The resultant corrupted RRD file
> doesn’t even load in rrdtool, seems the header is corrupt – I don’t (yet)
> understand enough of the mmap code to work out what could be causing this.
> I’m also trying to track the memory usage of the rrdcached process to see if
> it is indeed growing due to a leak.
>
>
>
> I think there are two bugs here – first, the memory leak causing the
> failure, and second, something in the code is not correctly handling a
> memory allocation failure and corrupts the RRD file as a result.
>
>
>
> Has anyone else experienced this?  And, more to the point, any RRD
> developers who understand the MMAP update code want to take a look or give
> some pointers?
>
>
>
> Steve
>
>
>  ------------------------------
>
> *Steve Shipway*
>
> ITS Unix Services Design Lead
>
> University of Auckland, New Zealand
>
> Floor 1, 58 Symonds Street, Auckland
>
> *Phone: +64 (0)9 3737599 ext 86487*
>
> *DDI: +64 (0)9 924 6487*
>
> *Mobile: +64 (0)21 753 189*
>
> *Email: s.shipway at auckland.ac.nz*
>
> P Please consider the environment before printing this e-mail
>
> * *
>
>
>
> _______________________________________________
> rrd-developers mailing list
> rrd-developers at lists.oetiker.ch
> https://lists.oetiker.ch/cgi-bin/listinfo/rrd-developers
>
>


-- 
 kevin brintnall =~ /kbrint at rufus.net/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.oetiker.ch/pipermail/rrd-developers/attachments/20101021/b437dd3f/attachment-0001.htm 


More information about the rrd-developers mailing list